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Watch the ocean 


Long-term monitoring is essential for working out how changes in the Atlantic Ocean current 


system will affect the planet. 


spurred scientific interest and human imagination for decades. 

A complex and fundamental system of ocean currents, includ- 
ing the wind-driven Gulf Stream, the AMOC influences the exchange 
of heat between the tropics and high latitudes. Driven mainly by 
cold, dense water in the salty Greenland and Labrador seas sinking 
to the bottom of the North Atlantic Ocean, the circulation regulates 
temperature and so serves as a global thermostat. 

But for how much longer? Potential sharp changes in the circula- 
tion have been identified as a possible tipping point in Earth’s physical 
systems. Since the 1950s, geologists and oceanographers have been 
gathering convincing evidence that alterations in ocean circulation are 
a key determinant of climate change. 

Ice-core records from Greenland suggest that abrupt shifts in 
circulation strength triggered dramatic temperature fluctuations 
during the last glacial period. Climate fluctuations on such a scale 
have, fortunately, not occurred in the present Holocene interglacial 
era. Still, signs of a markedly weakening AMOC, reported in 2005 
(H. L. Bryden et al. Nature 438, 655-657; 2005), provoked concern 
that the circulation might be on the brink of tipping into a weak phase 
once again, possibly as a result of human-induced climate warming. 

Subsequent ocean observations, from arrays of sensors strung across 
the North Atlantic, offered a more reassuring picture: the current was 
hugely variable, and so a single snapshot could be unrepresentative. 

Researchers have now gone back and taken another look. In a paper 
in Nature this week, scientists present palaeo-oceanographic evidence 
that deep convection of surface waters in the North Atlantic — the 
engine that keeps the AMOC in constant motion — began to decline 
as early as around 1850, probably owing to increased freshwater influx 
from Arctic ice that had melted at the end of a relatively cold period 
called the Little Ice Age (D. J. R. Thornalley et al. Nature 556, 227-230; 
2018). This could have caused a weakening in the ocean circulation. 

Ina second paper, researchers used global climate models and data 
sets of sea surface temperature to date the onset of the weakening to 
more recent times, around the mid-twentieth century (L. Caesar et al. 
Nature 556, 191-196; 2018). According to their models, the slowdown 
was about 15%; was most pronounced during winter and spring; and 
has led to a cooling of sea surface temperatures in parts of the northern 
Atlantic, together with a slight northward shift of the mean Gulf Stream 
path. This, the authors say, is probably a consequence of anthropogenic 
climate change. 

Importantly, the findings agree that the AMOC is ina relatively weak 
state. The wide margin of disagreement between the two independent 
studies on when the circulation started to weaken is probably due to the 
different methods used — and it highlights how immensely difficult it 
is to capture the AMOC'’s past variability. This will probably frustrate 
those who prefer their science to send a clear signal. But then, science 
is rarely so obliging. Can the effects of climate change and natural vari- 
ability on the AMOC be disentangled? And if the ocean circulation is 


Te Atlantic meridional overturning circulation (AMOC) has 


sensitive to climate change, as is highly likely, will the currents respond 
abruptly and perhaps violently at some point, or will the transition 
be smooth? These are among the most pressing questions in climate 
science. 

The slow progress on answering them should offer a stark reminder 
that the oceans are the most under-sampled component of the Earth 
system. The AMOC is just one part of a world-spanning circulation 
system, the physics — and influence on chemical cycling — of which 
is only poorly understood. 

Numerical models are an indispensable tool for studying ocean 
circulation and climate. But despite ever-increasing computer 
power, models fall short when it comes to reconstructing some- 
thing as nuanced and variable as ocean circulation. Long-term, serial 
measurements of circulation strength are what is needed. 

It is crucial, therefore, that existing ocean monitoring systems — 
including the Overturning in the Subpolar North Atlantic Program 
and the South Atlantic Meridional Overturning Circulation pro- 
gramme — are maintained over decades to come. Data from these 
arrays of monitoring instruments are just beginning to shed light on 
the complex water flows in key ocean regions. Yet securing funding for 
lengthy studies is an ongoing fight. 

There is more to be done. A United Nations sustainable development 
goal already includes a call for greater research capacity for promoting 
ocean health. Regional and national ocean-observation efforts should 
be coordinated, ideally under the Global Ocean Observing System. 
Meticulous observation is a prerequisite for understanding the oceans 
on which, ultimately, humankind depends. m SEE NEWS & VIEWS P.180 


Cosmic sirens 


Gravitational waves could help us understand 
differing measurements of the Universe. 


the rate of cosmic expansion around 90 years ago. Since the 

1990s, multiple independent techniques have converged on 
values much lower than Hubble’. They differ by less than 10%, but 
the differences seem to be statistically significant (3.7 standard devia- 
tions). Innovative techniques, including the detection of gravitational 
waves from stellar collisions such as one that astronomers witnessed 
last August, should settle the question in the next few years. The 
answer could contain some new and unexpected physics. 

In our expanding Universe, a galaxy’s rate of recession from our own 
can be measured easily from its redshift — how much its light waves 
stretch as they travel, owing to the expansion of the intervening space. 
The difficult part is measuring the galaxy’s distance. With his early 


(iisaes has come a long way since Edwin Hubble determined 
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techniques, Hubble discovered that most galaxies seem to recede at a 
rate proportional to their distance. His ‘Hubble constant’ quantifies 
that proportion. Today’s state-of-the-art observations suggest that, 
on average, galaxies’ speeds increase by 73.5 kilometres per second 
for every megaparsec (3.26 million light years) of distance. Thus, for 
example, galaxies 100 megaparsecs away recede at around 7,350 kms". 

This value of the Hubble constant comes from observing stars 
that act as standard candles. These have known intrinsic brightness, 
so their distance can be estimated from how bright they look in the 
sky. But the value of 73.5 clashes with the 66.9 estimated in 2015 by 
cosmologists who mapped the cosmic microwave background — 
the relic radiation from the Big Bang — using the Planck observa- 
tory of the European Space Agency (ESA). The discrepancy could 
still turn out to be caused by unknown artefacts of the measuring 
techniques, but both camps say that they are increasingly confident 
in their results. 

The Planck estimate relies on what is known as the standard model 
of cosmology. It makes assumptions regarding the composition of the 
Universe, and in particular the content of dark matter and the nature 
of dark energy, the mysterious driver of the acceleration of the cosmic 
expansion. So, ifthe discrepancy holds up, it could point to entirely 
new physics, implying that dark matter is stranger than physicists had 
assumed, or that the effects of dark energy change with time. 

By contrast, some wonder whether standard candles might not be 
as reliable as astronomers think. This month, another ESA mission, 
the Gaia telescope, will release a 3D map of the Milky Way that has 
unprecedented precision and depth, and will help astronomers test 
the reliability of these cosmic signposts. But, ideally, astronomers 
would like to have more direct ways of measuring distances outside 
our Galaxy. 

Enter gravitational waves. These stand ready to address some classic 
astronomical challenges with strong new evidence, as described in a 
News Feature on page 164. They might also help to resolve the issues 


surrounding the cosmic expansion. Health warning: these possibilities 
are speculative and controversial. 

When two cosmic orbs — such as the neutron stars seen merging 
last August — spiral into each other, they emit gravitational waves that 
carry information about their distance, constituting a ‘standard siren. 
This enabled physicists at the US-based Laser Interferometer Gravi- 
tational- Wave Observatory (LIGO) to calculate the Hubble constant. 

They obtained a value of 70, smack in the 


“If the middle of the standard-candle and cosmic- 
discrepancy microwave-background estimates. LIGO’s 
holds up, it data point has a large margin of error, but, as 
could point to researchers collect more of these events, the 
entirely new results might end up leaning conclusively one 


physics.” way or the other. 

Ultimately, gravitational waves could 
enable researchers to measure not just the current cosmic expan- 
sion, but also how the rate of expansion has evolved over the aeons. 
Two upcoming ESA projects will help enormously, especially if they 
get to fly at the same time, as many researchers hope. The gravita- 
tional-wave detector LISA (Laser Interferometer Space Antenna) 
should detect mergers of black holes across the Universe’s his- 
tory. And some astronomers anticipate that the X-ray observatory 
Athena (Advanced Telescope for High-Energy Astrophysics) might 
pick up photons from the same events and help researchers find 
the corresponding galaxies’ redshifts — although others consider 
this a long shot. 

Mapping standard sirens in this way should shed light on the nature 
of dark energy — cosmologists’ most coveted goal. They hope that it 
will provide hints about the future of the Universe. Predictions for an 
infinitely long-lasting future are outside the realm of science. But cos- 
mologists could still work out whether cosmic expansion will continue 
to accelerate for the foreseeable future, or whether that acceleration 
might increase, stop or perhaps reverse. m 


Awards to celebrate 
women in science 


emale scientists are under-represented in global research. Nature 

has long argued the need for initiatives to increase their opportu- 
nities and participation — so we are delighted to announce an awards 
programme that aims to do both. 

The two annual awards will recognize inspirational early-career 
female researchers and those who have worked to champion young 
women’s and girls’ participation in science. By rewarding and cele- 
brating these achievements, we hope the programmes will contribute 
to a positive shift towards the equity sorely needed in the research 
community. 

The first is called the Inspiring Science Award and will honour 
female scientists who have completed their PhD within the past 
ten years and have made an exceptional contribution to scientific 
discovery, as reflected in publications, poster and conference pres- 
entations, leadership, tutoring and mentoring. Candidates can be 
nominated by anyone in their research institute, and we encourage 
nominations from around the globe and across all subject areas. 
Our independent judging process will ensure that those working 
under adverse circumstances or in regions where there is limited 
access to scientific literature will not be unfairly disadvantaged. 

The second prize, the Innovating Science Award, recognizes 
individuals or organizations that have led a grass-roots initiative 
to support increased access to, or interest in, science, technology, 
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engineering and mathematics (STEM) for girls and young women 
around the globe. This backs our belief that supporting early 
interest in STEM worldwide is a crucial step towards sustainably 
increasing the representation of women in these subjects. Candi- 
dates for this award can nominate themselves. 

Nominations opened on 9 April and will close on 11 June 2018. 
A longlist of ten nominees for each award will be announced on 
24 July, and a shortlist of five will be announced on 4 September. 
Both awards are run by Nature Research in partnership with The 
Estée Lauder Companies. (Full details of the criteria and nomina- 
tion processes are available at nature.com/researchawards.) 

The winners of the awards will be announced in October. They 
will receive grants of US$10,000 to build on their efforts, and an 
invitation to an award ceremony. The Inspiring Science Award win- 
ner will also receive a grant of up to $5,200 to support open-access 
publication of their research, and the Innovating Science Award win- 
ner will receive up to $5,200 to support an event that showcases their 
initiative. These awards complement the existing Nature Awards for 
Mentoring in Science and the John Maddox Prize for promoting 
sound science and evidence on a matter of public interest. 

Nature strives to champion and showcase the achievements of 
researchers, and we have a responsibility to drive positive change 
in the research community. Our journals are committed to sup- 
porting gender equity (see go.nature.com/2g)lxtdj for a collection 
of related content). We recognize that a huge amount must be done 
to overcome the many barriers that women face to entry and pro- 
gression in research; these awards are just one small contribution. 
We look forward to identifying outstanding individuals who are 
deserving of these awards, celebrating their achievements and shar- 
ing their stories. = 
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lready this year, the US Food and Drug Administration (FDA) has 
Az or extended the use of several cancer drugs that have yet 

to show they will prolong life or improve its quality. Unfortunately, 
there is no guarantee that such benefits will be demonstrated over time, 
and these drugs, like most cancer treatments, increase the risk of side 
effects such as diarrhoea and susceptibility to infection. 

In my view, regulators should ensure that drugs benefit patients before 
allowing them to persist on the market. 

As part of my work as an oncologist, patients sometimes show me head- 
lines that describe new cancer drugs with words such as ‘game changer 
and ‘breakthrough: Like my patients, I'm excited to see what therapies are 
on the horizon. Unfortunately, these words are rarely the ones that come 
to mind when | appraise evidence from clinical trials. Many trials aimed at 
getting drugs to market depend on surrogate end 
points such as slowed tumour growth. However, 
a drug that shrinks tumours might not help to 
extend people’s lives. This is why most oncology 
drugs enter the market without clear evidence that 
they improve either the quality or the length of life. 

In 2017, my colleagues and I completed a study 
of all 48 cancer drugs approved by the Euro- 
pean Medicines Agency between 2009 and 2013 
(C. Davis et al. Br. Med. J. 359, j4530; 2017). Of the 
68 clinical indications for these drugs (reasons to 
use a particular drug on a patient), only 24 (35%) 
demonstrated evidence of a survival benefit at the 
time of approval. Even fewer provided evidence of 
an improved quality of life for symptoms such as 
pain, tiredness and loss of appetite (7 trials; 10%). 
Most indications (36 of 68) still lacked such evi- 
dence three or more years after approval. Other 
groups in other regions have observed similar trends. For example, a 2015 
study demonstrated that only a small proportion of cancer drugs approved 
by the FDA improved survival or quality of life (C. Kim and V. Prasad 
JAMA Intern. Med. 175, 1992-1994; 2015). 

Once the medicines appear on the market, companies and patient 
advocates argue that any delay in governments covering costs for these 
drugs will bring about pain, suffering and unnecessary deaths, even when 
benefits have not been demonstrated. 

Ifa drug does offer benefits, clinical trials are usually the best setting for 
these to shine through. People with cancer who are enrolled in clinical tri- 
als tend to be younger and much fitter than the general patient population. 
Because side effects are often worse for older or less-fit patients, benefits 
might not be realized or noticed in typical care settings. 

Clinical trials, drug regulation and the field of medicine are all compli- 
cated. Societal values vary by country; improved survival rates might be 
assessed differently for different cancers, depending on how long people 
diagnosed with cancer are expected to live. Studies show that people's 
expectations about a drug’s ability to extend life often far exceed what is 
observed. Clinicians should have honest conversations with patients to 
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*) Demand cancer drugs 
.e~y) that truly help patients 


Drug regulators and trial designs should assess benefits that actually matter 
to people with cancer, says Ajay Aggarwal. 


learn what constitutes a meaningful benefit for each individual. 

When we choose treatment options for advanced cancer (the main 
indication for new cancer-drug approvals), we must consider that 
toxicities related to treatments may shorten life expectancy rather than 
extend it, and we should ensure that treatments do not diminish quality 
of life. Unfortunately, many clinical studies either neglect quality-of-life 
measures entirely or rely on unvalidated instruments. One seminal study 
demonstrated that people with advanced lung cancer who had early access 
to palliative care alongside standard treatments had greater improvements 
in quality of life and survival, despite receiving fewer aggressive end-of-life 
treatments (J. S. Temel et al. N. Engl. J. Med. 363, 733-742; 2010). 

Regulators also need to focus on more measures that people value: 
reduced toxicity, and the ability to maintain enough function to return 
to work or keep up social ties. 

Some argue that the time required for 
randomized, controlled trials with meaningful 
measures would take too long. However, there have 
been innovations in designing robust trials meas- 
uring overall survival and quality of life, even in 
slowly progressing diseases such as prostate cancer. 

Approvals that let drugs stay in the market- 
place on the basis only of quick, easy surrogate 
end-points are unlikely to produce highly effec- 
tive treatments; we will simply get more drugs 
providing marginal value. 

I believe that the low bar also undermines inno- 
vation and wastes money. Copycat drugs with 
minimal benefits will continue to be approved on 
the basis of surrogates, and so will minimize incen- 
tives for true breakthroughs and game changers. 
At the same time, a large influx of drugs bringing 
limited benefit will force governments to spend a greater proportion of 
health funding on cancer drugs rather than on other treatment options. 

Another risk is that emerging, heavily marketed drugs could blind 
clinicians and patients from looking anew at existing options that might 
bring bigger benefits. It amazes me how much attention is given to drugs 
even as people with cancer struggle to access surgery and radiotherapy. 
Investment in screening and diagnostics research also falls far behind 
that of drug research. 

Ultimately, I want to access the best available therapies for the people 
I treat: the ones most likely to bring meaningful improvements in their 
quality and length of life, and the ones that reduce the toxicity associated 
with treatment. Any new cancer therapy, drug or not, should undergo 
robust evaluation for outcomes that truly matter to individuals. As it 
is, limited finances are too often being directed from evidence-based 
therapies to those that promise false hope. = 


Ajay Aggarwal is an oncologist at Guy’s and St Thomas’ NHS Trust, 
London, UK, and a senior lecturer at King’s College London. 
e-mail: ajay.aggarwal@kcl.ac.uk 
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Al boycott ends 


A group ofartificial 
intelligence (AI) researchers 
has ended a boycott of 
collaborations with the Korea 
Advanced Institute of Science 
and Technology (KAIST) in 
Daejeon, South Korea. Ina 
petition letter announced on 4 
April, more than 50 researchers 
stated their concerns over 
KAIST’s Research Center for 
the Convergence of National 
Defense and Artificial 
Intelligence, which is operated 
with Hanwha Systems, a 
defence company in Seoul. 
The signatories said they 
would not work with KAIST 
until they had been assured 
that the centre “will not 
develop autonomous weapons 
lacking meaningful human 
control”. On 9 April, the group 
rescinded the boycott after 
KAIST president Sung-Chul 
Shin said that the university 
would not engage in research 
that is counter to “human 
dignity including autonomous 
weapons lacking meaningful 
human control”. 


Scientist sentenced 


A Chinese scientist was given 
a ten-year prison sentence by a 
US court on 4 April for stealing 
genetically altered seeds. Zhang 
Weiqiang, a Chinese national 
who isa US resident, was 
found guilty in February 2017 
of stealing hundreds of seeds 
from his previous employer, 
Ventria Bioscience in Junction 
City, Kansas, and storing 

them in his home. Zhang was 
suspected of passing the seeds 
on to a Chinese crop-research 
institute. Plants grown from 
the modified seeds produce 
therapeutically valuable 
proteins such as human serum 
albumin, which is found in 
blood and needed in large 
quantities to replenish blood 
lost in injury or during surgery. 
Court documents say that 
Ventria Bioscience spent 


Suspected chemical attack in Syria 


An international team has begun an 
investigation into a suspected chemical- 
weapons attack on civilians in Douma 
(pictured), a besieged town in Syria. It is 
reported that the attack on 7 April killed 
dozens of people and affected hundreds 
more. The Organisation for the Prohibition 
of Chemical Weapons (OPCW), which 
enforces the global treaty banning the use of 
such arms, said on 9 April that it would send 
inspectors to Syria on a fact-finding mission. 
Inspectors will try to identify any agents used, 


millions of dollars developing 
the strains. Zhang was found 
guilty of conspiracy to steal 
trade secrets and conspiracy 
to transport stolen property 
across state borders, and 
sentenced to 121 months 

in prison. 


Pe FUNDING 
Fellowship launch 


UK Research and Innovation 
(UKRI) — Britain’s powerful 
new research-funding body 

— announced the creation ofa 
fellowship scheme for early- 
career scientists on 3 April. The 
Future Leaders Fellowships, 
launched in the organization's 
first week of operation, will by 
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2021 support 550 early-career 
researchers in any discipline, 
each funded for up to 7 years. 
UKRI says the fellowships 

are in addition to existing 
schemes run by the individual 
research-funding councils, 
which have been subsumed 
under UKRI. The new scheme, 
which is open to researchers 
in academia and industry, and 
around the world, is designed 
to “develop, retain, attract 

and sustain research and 
innovation talent in the UK”. 


Opioid research 
On 4 April, the US National 
Institutes of Health (NIH) 
announced a US$1.1- 
billion research initiative 


thereby confirming whether a chemical attack 
occurred. Physicians for Human Rights, a 
humanitarian organization in New York, 
called for immediate independent collection 
of environmental and biological samples 
from Douma. According to Human Rights 
Watch, there have been 85 chemical-weapons 
attacks in Syria since 2013, and the OPCW 
has confirmed the use of mustard gas and the 
nerve agent sarin in some instances. Since the 
civil war in Syria began in 2011, an estimated 
400,000 people have been killed. 


aimed at curbing the opioid 
epidemic. The budget for the 
programme, called Helping 
to End Addiction Long-term 
(HEAL), is nearly double 
what the NIH spent on such 
research in 2016. HEAL 
goals include designing 
longitudinal studies to follow 
people with chronic pain, 
developing neuroimaging 
technologies to understand 
how pain manifests in the 
brain, testing new therapies 
for addiction and overdoses 
and partnering with the US 
military and the Department 
of Veterans Affairs to 
develop non-pharmaceutical 
approaches to pain 
management. The NIH is also 
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partnering with private drug 
companies to develop non- 
addictive painkillers. 


Supercomputer cash 
The US Department of 
Energy (DOE) announced 

up to US$1.8 billion in 
funding for at least two new 
supercomputers on 9 April. 
The project request follows an 
award last June for the Aurora 
supercomputing system, 
currently being developed at 
Argonne National Laboratory 
in Illinois. One of the systems 
resulting from the April 
announcement will be housed 
at the Oak Ridge National 
Laboratory in Tennessee, 

and the second will be at the 
Lawrence Livermore National 
Laboratory in Livermore, 
California. The two 
supercomputers would come 
online between 2021 and 2023. 
The DOE says funds could 
also support a third system 

or an upgrade to Aurora, 
depending on need. 


Defence research 


The number of Japanese 
research institutions that 

have procedures to evaluate 
the ethics of military-related 
research has doubled in the 
past year, according toa 
survey released by the Science 
Council of Japan on 4 April. Of 
the 135 Japanese institutes and 
universities that responded 

to the survey, 26% said that 


TREND WATCH 


A survey of 1,839 current 

and former UK students has 
found that 41% experienced 
sexual misconduct — such 

as inappropriate comments, 
unwanted touching or assault 
— by staff at university. About 
12% of current students said 

a staff member had touched 
them in a way that made them 
uncomfortable. Women were 
more likely than men, and 
postgraduates more likely than 
undergraduates, to report such 


harassment. The National Union 
of Students conducted the survey. 


they now screen research for 
possible military applications, 
up from 13% in 2017. The 
council, which advises the 
Japanese cabinet, called for 

a boycott of military-related 
research in March 2017 after 
the government boosted 
funding for ‘dual use’ scientific 
research with potential 
military applications. In 
response to the funding boost, 
the council asked institutions 
to introduce evaluation 
guidelines for such research. 


FACILITIES 


Whole genomes 
The UK Biobank, which 
holds health records and 
other biological data for 
some 500,000 people, 
announced on 5 April 

that it plans to sequence 

the complete genomes of 
50,000 participants. In 2017, 
the powerhouse database 
released limited genetic 

data — on 800,000 DNA 
variants that tend to differ 
between individuals — for all 
participants. These data have 
already been mined in dozens 
of studies looking for DNA 
variants linked to diseases 
and biological traits. The 
whole-genome sequencing, 
which will be funded bya 
£30-million (US$42-million) 
grant from the UK Medical 
Research Council, will allow 
for more-sophisticated studies. 
In January, the UK Biobank 
(pictured, a blood sample 


from the biobank) announced 
plans to release data on the 
exomes — the small portion of 
genome that encodes proteins 
— ofall participants by 2019. 


Fusion-lab site 

A laboratory near Rome 

will host a €500-million 
(US$618-million) experiment 
on nuclear fusion called 

the Divertor Tokamak Test 
(DTT) facility. Italy’s energy- 
research agency, ENEA, said 
on 4 April that it had selected 
its Frascati laboratory over 
eight other candidate sites in 
the country. The DTT will test 
technologies for extracting 
heat from fusion plasma. 
This is a necessary step in the 
development of fusion-based 
commercial power stations 
that the experimental ITER 
reactor, currently under 
construction in France, is 
not designed to take. ENEA 
plans to begin construction 
before the end of the year, 
and to fund the project in 


SEXUAL MISCONDUCT AT UK UNIVERSITIES 


In a national survey of 1,839 current and former UK students, 41% 
said they had experienced sexual misconduct by staff. These 
experiences were particularly prevalent among postgraduates. 


1,839 respondents 


Have experienced sexual 
misconduct by staff 
41% 


Have not experienced or are not 
aware of sexual misconduct by staff 


Know someone who has experienced 
sexual misconduct by staff 


5% 
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part with a €250-million loan 
it has requested from the 
European Investment Bank. 

It also expects a €30-million 
contribution from China and 
€60 million from the European 
EUROfusion consortium. 


TECHNOLOGY 


Moon prize revived 
XPRIZE of Culver City, 
California, is continuing 

its private competition to 
spur lunar exploration, even 
though a US$30-million 
version of the programme 
sponsored by Google ended 
on 31 March with no winners. 
The group said on 5 April 
that it was looking for new 
sponsors but would continue 
the competition without 

cash for now. Five teams 

were in the running for the 
Google Lunar XPRIZE, which 
required entrants to develop 
a vehicle capable of landing 
on the Moon, travelling 

500 metres across the surface 
and broadcasting images and 
video back to Earth. 


|} BUSINESS 
Article-access tool 


Scholarly-services firm 
Clarivate Analytics has 
purchased artificial- 
intelligence company 
Kopernio, which has a tool that 
helps researchers to find and 
access journal articles with one 
click. Clarivate, which owns 
the scholarly search engine 
Web of Science, announced 
the deal on 10 April but did not 
disclose its value. Kopernio 
offers a browser plug-in that 
makes it easier for researchers 
to find and download 
literature that they already 
have legitimate access to. The 
feature will be integrated into 
Web of Science to help users 
get around the problem of 
having to sign in to multiple 
sites to access articles available 
as part of an institutional 
subscription. It also logs 
academics’ access credentials 
so that they can access pay- 
walled papers off campus. 
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NEWS IN FOCUS 


Latest NASA 
satellite will search for 
nearby exoplanets p.158 


UK data reveal big 
gender pay gaps among 
science employers p.160 


Promising 


approach in cancer 
treatment hits snag p.161 


\\ The coming 
gold rush in gravity- 


wave research p.164 
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China’s new brain-science centre will host some 50 principal investigators and will also support external researchers. 


Beijing 


launches pioneering 


neuroscience centre 


Large research facility will be key part of much-anticipated brain initiative. 


BY DAVID CYRANOSKI 


year in brain science. Beijing announced 

plans last month to build a major neuro- 
science centre that will rival in size some of 
the world’s largest organizations in that disci- 
pline. It will also serve as a core facility for the 
country’s long-awaited brain project — China's 
version of the high-profile brain-science initia- 
tives under way elsewhere in the world. 

The Chinese Institute for Brain Research was 
officially established in Beijing on 22 March, 
with an agreement signed by representa- 
tives of the Beijing municipality and seven 
research organizations based in the capital. 


F or China, 2018 is shaping up to be a big 


The agreement named two neuroscientists — 
Peking University’s Rao Yi and Luo Minmin of 
the National Institute of Biological Sciences in 
Beijing — as co-directors. 

The new Beijing facility will be one of the 
first concrete developments in China's national 
brain-research project, which has been under 
discussion for five years but has yet to be 
formally announced. The United States and 
Europe each launched billion-dollar brain 
initiatives in 2013, and Japan set up a smaller 
project the following year. South Korea 
answered with its own initiative in 2016. 

China is expected to complement these 
projects with its rapidly growing cadre of top 
neuroscientists, abundant supplies of research 


monkeys and big investments in brain-imaging 
facilities. “The brain is such a complex system 
that significant efforts are needed to tame 
this complexity at an international level? says 
Katrin Amunts, scientific-research director of 
Europe’s Human Brain Project. China has the 
potential to provide important insights that 
relate to the work of other projects, she says. 


PLANS AFOOT 

Luo says that he will oversee the roughly 
50 principal investigators who will have lab- 
oratories at the new centre, with Rao taking 
charge of external grants that will support 
around 100 investigators throughout China. 
Luo says that the centre will be similar in 
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> organization and scientific scope to the US 
National Institute of Mental Health, a major 
US brain-science funder, although ona smaller 
scale. 

The Chinese centre will be a partnership 
between Beijing’s premier biomedical institu- 
tions, among them the Chinese Academy of 
Sciences, the Academy of Military Medical Sci- 
ences, Peking University and Tsinghua Univer- 
sity. Luo says it will support projects that use 
the latest biomedical methods, such as high- 
throughput single-gene sequencing, precision 
genome editing and big-data processing. He 
also hopes to develop better imaging tools, 
including a voltage sensor that can directly 
record neuronal activity, and high-speed- 
imaging microscopes that will allow detailed 
views of brain activity. 

This year, Luo plans to use 180 million 
Chinese yuan (US$29 million) provided by 
the Beijing municipal government to hire the 
first five or six research groups, and to install 
them in a building already constructed by the 
municipality, which is across the road from 
his institute. When operating at its full capac- 
ity of 50 researchers, which Luo plans to have 
within 5 years, some 400 million yuan per year 
will be needed. He hopes to secure this from 


the brain-science project, with a substantial 
amount still coming from Beijing. 

Luo says that it will be a “docking site” for 
the Chinese brain project, which has been in 
planning since the United States and Europe 
launched their programmes. So far, few firm 
details about the project have been released. 
Scientists who spoke to Nature say they expect 
that the government will officially launch the 
initiative some time this year. 


STAFFING CHALLENGES 

In the meantime, other facilities are preparing 
their bids for support from the national pro- 
ject. A large science park under construction 
in Shanghai will house a ‘southern centre’ for 
neuroscience research. The centre’s organiz- 
ers say this will support many more principal 
investigators than its Beijing counterpart, 
which scientists are dubbing the northern 
centre. 

Feng Jianfeng, a computational biologist and 
head of Fudan University’s Institute of Science 
and Technology for Brain-inspired Intel- 
ligence, has been involved in organizing the 
Shanghai projects. He says that one focus will 
use artificial intelligence (AI) to study brain 
diseases. Feng adds that, with 190 million yuan 


from the university, he is already setting 
up a brain-imaging facility that will house 
the largest number of magnetic resonance 
imaging devices in Asia, and will be based at 
the southern centre. AJ algorithms will screen 
the images, comparing diseased brains with 
healthy ones, to form part of the world’s largest 
brain database, he says. 

Another programme expected to be integral 
to the country’s brain-science initiative is an 
international connectome project, which is 
being designed by Mu-Ming Poo, director 
of the Institute of Neuroscience in Shanghai. 
Connectome projects attempt to map out all 
the neural connections in the brain. 

Finding enough researchers might be the 
greatest challenge for both the individual 
centres and the Chinese brain-science pro- 
ject. Jeffrey Erlich, a neuroscientist at NYU 
Shanghai, says that, as well as hiring top 
neuroscientists, the initiatives will need to 
fund postdoctoral positions and graduate- 
school research posts offering internationally 
competitive salaries. 

“That would increase the number of top 
students going into neuroscience,’ says Erlich. 
“Then, in five to ten years, China could have a 
fresh crop of top young scientists.” = 


ASTRONOMY 


Exoplanet hunter will seek 
worlds close to home 


NASA’s mission is designed to spot planets orbiting nearby bright stars. 


BY ALEXANDRA WITZE 


craft won't be easy. Since its launch in 

2009, Kepler has discovered nearly three- 
quarters of the 3,700-plus known exoplanets. 
And there are thousands more candidates 
waiting to be confirmed. 

So NASA is taking a different approach 
with its next planet-hunting mission. On 
16 April, the agency plans to launch the 
US$337-million Transiting Exoplanet Sur- 
vey Satellite (TESS), which will scrutinize 
200,000 nearby bright stars for signs of orbit- 
ing planets. TESS will probably find fewer 
worlds than Kepler did, but they are likely to 
be more important ones. 

“It’s not so much the numbers of planets 
that we care about, but the fact that they are 
orbiting nearby stars,” says Sara Seager, an 
astrophysicist at the Massachusetts Institute of 
Technology (MIT) in Cambridge and deputy 
science director for TESS. 


FE illing the shoes of NASA's Kepler space- 
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TESS is meant to identify planets that 
are close enough to Earth for astronomers 
to explore them in detail. Team scientists 
estimate that the spacecraft will discover 
more than 500 planets that are no more than 
twice the size of Earth (P. W. Sullivan et al. 

Astrophys. J. 809, 77; 


“We’llseea 2015). These worlds 
whole new will form the basis 
opening of for decades of fur- 
exoplanet ther studies, includ- 
studies.” ing searches for signs 


of life. “We'll see a 
whole new opening of exoplanet studies,” 
Seager says. 

Both Kepler and TESS are designed to scan 
the sky for planetary transits, the slight dim- 
ming that occurs when a planet moves across 
the face of a star and temporarily blocks 
some of its glow. For most of its mission, 
Kepler stared at a deep but narrow slice of 
the Universe — peering out some 920 parsecs 
(3,000 light years) from Earth but covering 


only 0.25% of the sky. Its celestial census 
showed that planets were common through- 
out the Milky Way. “We found that planets are 
everywhere,’ says Elisa Quintana, an astro- 
physicist at NASA’s Goddard Space Flight 
Center in Greenbelt, Maryland. 


MEETING THE NEIGHBOURS 

By contrast, TESS will go shallow and broad 
— looking at stars within 90 parsecs of Earth 
but covering more than 85% of the sky. Its 
4 cameras will give the spacecraft a field of 
view about 20 times the size of Kepler’s (see 
‘Scanning the sky’). TESS will sweep the 
southern sky first and then, after a year, turn 
its attention to northern stars. 

The observing swathes in each hemi- 
sphere will overlap at the south and north 
ecliptic poles, which are points perpendic- 
ular to the plane of Earth’s orbit. That’s by 
design, because NASA’s James Webb Space 
Telescope, now planned for a 2020 launch, 
will also be able to study those regions at any 
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Antenna transmits 
data to Earth every 
two weeks, as the 
spacecraft flies past. 


IN FOCUS 


SCANNING THE SKY 


NASA’s Transiting Exoplanet Survey Satellite (TESS) will monitor 


Solar panels 


200,000 stars during its 2-year mission hunting worlds outside the 


Solar System. Researchers expect the craft to find more than 1,600 


oe Sun shade 


ON THE LOOKOUT 
TESS will seek new 
worlds by watching 
for the dimming that 
occurs when a planet 
passes across the 
face of its star. 


Each of the craft’s 
four cameras 
contains seven 
stacked lenses. 


Phasing orbit 


REACHING ORBIT 


given time. Webb's 6.5-metre primary mir- 
ror will allow detailed spectroscopic studies 
of the planets’ atmospheres, but it will be in 
high demand for a range of other astronomi- 
cal research. “The time on Webb is going 
to be so precious,” says George Ricker, an 
astrophysicist at MIT and TESS’s principal 
investigator. 

Once TESS spots interesting planetary can- 
didates, a fleet of Earth-based observatories 
will kick into action to gather more data. These 
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The Transiting Exoplanet Survey Satellite will search more than 85% of the sky. 


Engine burn 
to final orbit. 


TESS’s search area will 


planets, including about 500 that are twice the size of Earth or smaller. 
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The Kepler craft searches up 
to 920 parsecs (3,000 light 
years) from Earth, but covers 
just 0.25% of the sky. 


extend 90 parsecs (300 


light years) from Earth 
and cover 85% of the sk 


will include planet-hunting stalwarts such as 
the HARPS instrument at the European South- 
ern Observatory in La Silla, Chile, and the new 
Miniature Exoplanet Radial Velocity Array 
(MINERVA)- Australis, a group of five planned 
0.7-metre telescopes near Toowoomba, 
Australia. “We have the ability to hammer on 
a target every night if we need to,” says Rob 
Wittenmyer, an astronomer at the University 
of Southern Queensland in Toowoomba who 
helps lead MINERVA- Australis. 


y. 


These and other ground-based telescopes 
will be able to deduce the TESS planets’ masses, 
and from that their composition — whether 
they are rocky, icy, gassy or something else. 


A WHOLE NEW WORLD 

Recent research suggests that TESS may 
yield a greater bounty than once thought. 
Earlier this year, MIT astronomer Sarah 
Ballard recalculated how many planets TESS 
might find orbiting the cool, plentiful stars 
known as M dwarfs — and predicted some 
990 such planets, 1.5 times more than ear- 
lier estimates (S. Ballard Preprint at https:// 
arxiv.org/abs/1801.04949; 2018). The sheer 
volume of discoveries would allow astrono- 
mers to begin comparing broad classes of 
exoplanets: learning how stellar flares affect 
planetary atmospheres, for instance, or what 
sorts of planets surround stars of different 
ages. 

TESS will soon have company. The Euro- 
pean Space Agency (ESA) plans to launch its 
Characterising Exoplanet Satellite late this 
year. The craft will measure the sizes of known 
planets — from those a little bigger than Earth 
to ones that are roughly Neptune-sized — 
orbiting nearby bright stars. ESA is also plan- 
ning two missions for the 2020s: PLATO to 
study Earth-sized exoplanets, and ARIEL to 
study planetary atmospheres. 

The next generation of missions will come 
just in time: Kepler is on its last legs, with only 
a few months’ worth of fuel left to help it make 
its final discoveries. m 
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UK wage data reveal 
science’s gender gap 


Reports affirm systemic struggles for women in science. 


BY HOLLY ELSE 


any UK science employers pay 
Men: much less than men, and 

some institutions are far less equal 
than others, according to an analysis conducted 
by Nature of statistics released last week. Univer- 
sities, pharmaceutical companies, funders and 
other science-focused organizations maintain 
a gender pay gap that is 50% greater than the 
national average for all employers. 

In 2017, the United Kingdom became one 
of the first nations in the world to require 
employers to report differences in pay between 
men and women. Organizations that employ 
more than 250 people must report details of 
their gender pay gap, the representation of 
men and women in each pay quartile and the 
gender breakdown of who receives bonus pay. 
More than 10,200 organizations have now 
uploaded data to the government's portal for 
gender pay-gap figures. 

The gender pay gap refers to the difference in 
the average hourly wage of all men and women 
across a workforce. It is not the same as unequal 
pay, when men and women are paid differently 


RESEARCH WAGE GAP 


for performing the same role, which has been 
illegal in the United Kingdom since 1970. 

To see how science shapes up, Nature 
analysed data for universities, research insti- 
tutes, selected grant funders and some industrial 
employers (see “Research wage gap’). 

Science institutions fared poorly overall. Of 
the 172 organizations included in the analysis, 
96% pay men more than women, according 
to the companies’ reported median pay gaps. 
Nationwide, 78% of all organizations favour 
men financially. The median gap between 
genders among science employers is 15%, 
compared with the UK median of 10%. The 
median offers the best representation of typical 
differences in pay, because it is not skewed by 
outlying high or low figures. 

The median pay gap for universities is 16%, 
research institutes 9%, funders 10%, industrial 
employers 12% and for five scientific publish- 
ers, 22%. (Macmillan Publishers, the division 
of SpringerNature that publishes Nature, has a 
median gender pay gap of 13%.) 

Much of an organization’s pay gap comes 
down to how women and men are distrib- 
uted through the ranks. Women are often 


UK companies with more than 250 employees had until 5 April to publish statistics on the gender pay gap. 
Nature analysed data from universities, pharmaceutical companies and other employers of scientists. 


Science employers averaged a median pay difference of 15% in favour of men, 


compared with the UK-wide figure of 10%. 
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A lack of women in senior roles underlies many pay gaps. One report found that less than one-quarter 
of UK professors are female, even though women make up 45% of the academic workforce. 
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Nature's analysis is based on data from 122 universities (including Cambridge and Oxford colleges with more than 250 employees), 
11 science institutes, 29 companies, 10 research funders and 5 science publishers. 
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over-represented in low-paid and low-skilled 
jobs, whereas men are likely to make up the 
bulk of workers in senior and high-paid roles. 
Stripping out non-academic roles from the 
university data would probably shrink the gap 
between male and female earnings, says Jeff 
Frank, an economist at Royal Holloway, Uni- 
versity of London, in Egham, UK, who studies 
gender pay gaps in science. 

The median gender pay gap for all academic 
staff at UK universities is 12%, according to a 
report released in 2017 by the University and 
College Union (see go.nature.com/2ezlidw). 
For professors at research-heavy institutions, 
the figure is 7%. The driving force for this pay 
gap, says the report, is a “very clear and continu- 
ous decline” in the proportion of women repre- 
sented as academic rank increases in seniority. 
Across the board, 45% of the entire academic 
workforce is female, for example, but less than 
one-quarter of all professors are women. 


FUNDING GATEWAY 

The London-based Wellcome Trust has a 21% 
median gender-pay gap, which it is seeking 
to close by training staff to help mitigate bias, 
and introducing fairer ways of recruiting, pro- 
moting and retaining women at senior levels. 
Couch says that this approach is important 
because her organization and other funding 
agencies act as the gatekeepers to science, and 
the Wellcome Trust could be missing out on 
supporting excellent ideas. Of the other nine 
funders Nature examined, three had non- 
existent or negligible pay gaps, including the 
Engineering and Physical Sciences Research 
Council and the Economic and Social 
Research Council. 

Across 29 research companies analysed 
by Nature, oil and gas businesses, including 
those owned by BP and Shell, generally had 
the largest median pay gaps and the lowest 
proportions of women in the top pay quartile. 

In the pharmaceutical industry, there is 
huge variation. One company, MSD (the UK 
subsidiary of Merck), has a 7% pay gap in 
favour of women; another, GlaxoSmithKline, 
reports small differences in pay, which favour 
men. Pfizer and AstraZeneca have more- 
typical gender pay gaps, at 18% and 13%, 
respectively. 

There is one obvious way to abolish gender 
pay gaps. One UK university eliminated its 
professorial pay gap overnight — by simply 
boosting women’s pay, says Alice Chilver, head 
of organizational development at University 
College London, who heard the anecdote in 
March, at a conference in London that she 
organized on gender pay gaps. 

Quick fixes aside, universities and other 
science employers might need to make more- 
significant changes to be able to close their 
gender pay gaps. “It is all about the culture, the 
behaviour, the habits and patterns, the beliefs, 
the fears,” Chilver says. “Until we can tap into 
and work with those cultures more effectively, 
change is going to be very slow.’ m 
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Puerto Rico’s statistics 
agency in jeopardy 


Reorganization could threaten reliable, independent data about the island, critics say. 


BY GIORGIA GUGLIELMI 


uerto Rico’s senators last week 
P approved a plan to overhaul an 

independent statistics agency 
tasked with coordinating the collec- 
tion and analysis of crucial data on 
the island. The reorganization will 
wreck the US territory’s ability to 
produce credible data about itself, 
including updated estimates of the 
death toll from last year’s Hurricane 
Maria, critics of the plan say. 

The decision paves the way for 
the restructuring of several govern- 
ment agencies, including the Puerto 
Rico Institute of Statistics (PRIS). 
To make it official, policymakers 
must now approve legislation dis- 
mantling the laws that established 
PRIS. Under Governor Ricardo Rossellé’s 
plan to streamline government agencies, first 
introduced in January, PRIS would become an 
office in the Department of Economic Devel- 
opment and Commerce, which would contract 
the institute’s duties to private companies. 

But some fear that privatizing official sta- 
tistics isn’t in the island’s best interests. “The 
private companies are going to be chosen by 
the government and we don’t know how inde- 
pendent their leaders are going to be,” says 


Changes loom for body that handles statistics such as hurricane damage. 


Monica Feliu-Mojer, director of communica- 
tions and science outreach at Science Puerto 
Rico, a non-profit group based in San Juan. 
Another worry is that private companies 
might not distribute their data freely, or provide 
access to information on how they collected 
and analysed the numbers, says Steve Pierson, 
director of science policy at the American 
Statistical Association in Alexandria, Virginia. 
Since PRIS began operating in 2007, it has 
worked to improve the quality of government 


agencies’ statistics: the institute 
trains statisticians in new method- 
ologies, ensures that data collection 
and analysis meet international 
standards and helps the agencies to 
make their data publicly accessible. 

PRIS has improved tracking of 
Puerto Rico’s mortality rate, and 
it established a fraud-prevention 
system related to the US Medicaid 
health-insurance programme, saving 
the government millions of dollars. 

But Rossell6é disputes the agency's 
effectiveness. PRIS “has failed in 
establishing efficient data gather- 
ing procedures that produce reliable 
statistics’, says Alfonso Orona, the 
governor's principal legal counsel. 
He says that outsourcing data col- 
lection and analysis will help to 
address this. 

It’s likely that lawmakers will approve the 
legislation that would officially dismantle the 
institute, says Roberto Rivera, a statistician at 
the University of Puerto Rico at Mayagiiez. 
Puerto Ricans are grappling with many issues, 
including the aftermaths of last year’s hur- 
ricanes and a series of education and labour 
reforms, so PRIS is nota priority, he says. “If 
there’s not enough pressure on the govern- 
ment, they'll get their way.’ = 
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THERAPEUTICS 


Promising cancer drug hits snags 


Physicians struggle to identify which patients are likely to respond to cutting-edge therapy. 


BY HEIDI LEDFORD 


ancer specialists in the United States 
( had high hopes last year when they 

gained approval for a new approach to 
treatment: a drug that targeted certain tumours 
regardless of where they first appeared in 
the body. 

But clinicians and researchers are strug- 
gling to put that plan into practice. Although 
the drug itself works well against a variety 
of tumour types, there have been problems 


with some of the tests used, which identify 
suitable tumours on the basis of certain 
molecular markers. 

On 15 April at the American Association for 
Cancer Research annual meeting in Chicago, 
Illinois, researchers and representatives from 
the US Food and Drug Administration (FDA) 
will discuss how best to tackle the issue. “If you 
get a false negative result, you're not going to 
give that patient the therapy, which is terrible,” 
says Zsofia Stadler, an oncologist at the Memo- 
rial Sloan Kettering Cancer Center in New 


York City. “That’s why there’ such a debate.” 

The drug in question, pembrolizumab 
(Keytruda), works by firing up the body’s 
immune responses against tumours. First 
approved by the FDA in 2014 to treat mela- 
noma, it has since been given the go-ahead to 
treat a handful of other cancers, including lung 
cancer. 

But last year, researchers reported that 
patients whose tumours had a disabled DNA- 
repair system also responded to the drug, 
regardless of where the tumour originated > 
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> (D. T. Le et al. Science 357, 409-413; 
2017). Damaged DNA can yield mutant pro- 
teins, which the immune system could target 
as potential invaders. Scientists think that 
this increases the chances that immune cells 
unleashed by pembrolizumab will find and 
attack the tumour. 

In May 2017, the FDA allowed pharma- 
ceutical giant Merck of Kenilworth, New Jer- 
sey, to market pembrolizumab to people with 
advanced-stage cancer who had any solid 
tumour with that particular DNA-repair defect. 
“This is absolutely a breakthrough approval,” 
says Razelle Kurzrock, an oncologist at the Uni- 
versity of California, San Diego. “We have seen 
some dramatic responses in our patients.” 

But the three kinds of test commonly used to 
look for DNA damage arising from that defect 
can produce conflicting results, says Heather 
Hampel, a genetic counsellor at the Ohio State 
University in Columbus. One relies on PCR, 
a process that amplifies specific regions of the 
genome; a second looks for certain proteins; 
and a third relies on DNA sequencing. “Which 
is the best? Is any positive on any test sufficient?” 
Hampel says. “Does that mean you should try 
them all? No one wants to miss a patient who 
might benefit from pembrolizumab:” 

And there are signs that some of the tests 
might work better in certain tissues than in 
others, says Shridar Ganesan, a physician and 


cancer researcher at the Rutgers Cancer Insti- 
tute of New Jersey in New Brunswick. PCR 
assays, for example, look for changes in certain 
regions of DNA called microsatellites. Particular 
microsatellites might be more prone to damage 
in some tissues than in others, he says. 

Stadler notes that the degree to which the 
DNA changes might also vary from tissue to 
tissue: colon cancers tend to accumulate many 
mutations, whereas tumours in the adreno- 
cortex generally have 


“Which is the fewer. That can lead to 
best? Is any a false negative result 
positive on any in tissues with fewer 
test sufficient? mutations, she says. 

Does that mean Similar complica- 
you should try tions might arise for 
themall?” some future tissue- 


agnostic drug approv- 
als, particularly those based on DNA-repair 
defects. This could include drugs called PARP 
inhibitors, which are approved in the United 
States for breast and ovarian cancers caused 
by mutations in either of two genes involved in 
DNA repair: BRCA1 or BRCA2. Researchers are 
looking at whether PARP inhibitors might also 
work in any solid tumour that carries similar 
DNA-repair defects, even if they aren't caused 
by BRCA1 or BRCA2 mutations. There are mul- 
tiple tests available for identifying the patterns 
of DNA damage in such tumours, says Hampel. 


Evidence has also been building that the 
overall number of mutations in a tumour could 
indicate how likely it is to respond to immuno- 
therapies such as pembrolizumab. Tests for this 
might also be complex, notes Stadler. 

Eventually, some of these issues will be ironed 
out, says Michael Overman, an oncologist with 
the University of Texas MD Anderson Cancer 
Center in Houston, as researchers gather data 
on which tests work best in which cancers. But 
the FDA was wise to move forward with the 
approval rather than wait for more evidence to 
sort out the issues with the molecular marker 
tests, he says. “There are still a lot of open ques- 
tions, but the therapy works exceptionally well,” 
he says. “It was the right thing to do.” m 


CORRECTIONS & CLARIFICATIONS 

The News story ‘Alzheimer’s study zeroes 
in on enigmatic protein’ (Nature 555, 
567-568; 2018) misstated the radioactive 
marker that will be used in the tau scans. It 
is GTP1, not GPT1. 

The News story ‘Copyright reforms draw 
fire from scientists’ (Nature 556, 14-15; 
2018) should have made it clear that when 
Vanessa Proudman talked of “that process” 
she was referring to how institutional 
repositories deal with copyright violations. 
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Aft e r 7 clut ch O f nthe mid-1980s, Bernard Schutz came up with a new solution to one of astronomy’s 

oldest problems: how to measure the distance from Earth to other objects in the cos- 

‘1 1 1 mos. For generations, researchers have relied on an object's brightness as a rough gauge 

his to r 1C de te C tions ) for its distance. But this approach carries endless complications. Dim, nearby stars, for 
: : example, can masquerade as bright ones that are farther away. 

gr avit ational ~ V V ave Schutz, a physicist at the University of Cardiff, UK, realized that gravitational waves 

h h could provide the answer. If detectors could measure these ripples in space-time, 

r e S e ar C er S ave emanating from interacting pairs of distant objects, scientists would have all the infor- 

mation needed to calculate how strong the signal was to start with — and so how far the 


e e 
S et their sight S on waves must have travelled to reach Earth. Thus, he predicted, gravitational waves could 


be unambiguous markers of how quickly the Universe is expanding. 


e e 
S O I I | e a I | | bitious His idea was elegant but impractical: nobody at the time could detect gravitational 


° °cfe waves. But, last August, Schutz finally got the opportunity to test this concept when 
S cientific quarry ; the reverberations of a 130-million-year-old merger between two neutron stars passed 
through gravitational-wave detectors on Earth. As luck would have it, the event occurred 
ina relatively nearby galaxy, producing a much cleaner first measure than Schutz had 
BY DAVIDE CASTELVECCHI dreamed. With that one data point, Schutz was able to show that his technique could 
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become one of the most reliable for measuring distance. “It was 
hard to believe,’ Schutz says. “But there it was.” 

More mergers like that one could help researchers to resolve an 
ongoing debate over how fast the Universe currently is expand- 
ing. But cosmology is just one discipline that could make big 
gains through detections of gravitational waves in the coming 
years. With a handful of discoveries already under their belts, 
gravitational-wave scientists have a long list of what they expect 
more data to bring, including insight into the origins of the Uni- 
verse’s black holes; the extreme conditions inside neutron stars; a 
chronicle of how the Universe structured itself into galaxies; and 
the most-stringent tests yet of Albert Einstein's general theory 
of relativity. Gravitational waves might even provide a window 
into what happened in the first few moments after the Big Bang. 

Researchers will soon start working down this list, with the 
help of the US-based Laser Interferometer Gravitational-Wave 
Observatory (LIGO), the Virgo observatory near Pisa, Italy, and 
a similar detector in Japan that could begin making observations 
next year. They will get an extra boost from space-based inter- 
ferometers, and from terrestrial ones that are still on the drawing 
board — as well as from other methods that could soon start 
producing their own first detections of gravitational waves (see 
“The gravitational-wave spectrum). 

Like many scientists, Schutz hopes that the best discoveries will 
be ones that no theorist has even dreamed of. “Any time you start 
observing something so radically new, there’s always the possibil- 
ity of seeing things you didn’t expect” 


SPINNING CLUES 

Fora field of research that is not yet three years old, gravitational- 
wave astronomy has delivered discoveries at a staggering rate, 
outpacing even the rosiest expectations. In addition to the dis- 
covery in August of the neutron-star merger, LIGO has recorded 
five pairs of black holes coalescing into larger ones since 2015 
(see ‘Making waves’). The discoveries are the most direct proof 
yet that black holes truly exist and have the properties predicted 
by general relativity. They have also revealed, for the first time, 
pairs of black holes orbiting each other. 

Researchers now hope to find out how such pairings came to 
be. The individual black holes in each pair should form when 
massive stars run out of fuel in their cores and collapse, unleash- 
ing a supernova explosion and leaving behind a black hole with a 
mass ranging from a few to a few dozen Suns. 

There are two leading scenarios for how such black holes could 
come to circle each other: they might start as massive stars in 
each other’s orbit, and stay together even after each goes super- 
nova. Or, the black holes might form independently, but be driven 
together later by frequent gravitational interactions with other 
objects — something that could happen in the centres of dense 
star clusters. 

Either way, the objects’ energy gradually disperses in the form 
of gravitational waves, a process that pulls the pair into an ever 
tighter and faster spiral, eventually fusing into one more-massive 
black hole. Ilya Mandel, a LIGO theorist at the University of Bir- 
mingham, UK, says that for LIGO and Virgo to see such pairs 
merge, typical black holes need to have started their mutual orbit 
separated by a distance of less than one-quarter that between 
Earth and the Sun. “Tf you start out with the two black holes any 
farther apart, it will take longer than the age of the Universe” for 
them to merge, Mandel says. 

The five black-hole mergers discovered so far are not suffi- 
cient to determine which formation scenario dominates. But in 
an August analysis of the first three detections, a group includ- 
ing Mandel and Will Farr, a theoretical astrophysicist and LIGO 
member at the University of Birmingham, suggested that just ten 
more observations could provide substantial evidence in favour 
of one scenario or the other’. This would involve scrutinizing the 
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MAKING WAVES 


When two black holes or neutron stars spiral into each other, they produce 
distinctive ripples in space-time called gravitational waves. Teams with LIGO’s 
two detectors in the United States and with Virgo, the observatory’s counterpart 
in Italy, have announced the detection of six events so far. 


DECIPHERING A WAVE 
When a signal is received, the frequency and rate of frequency change 
provide information about the masses of the objects in the binary source. 


Frequency Change 


With this information, physicists can 
then determine how strong the 
gravitational waves were at their origin. 


The difference between the strength of the received 
wave and that of the predicted one indicates how 
far the waves have travelled through space. 


ALREADY DETECTED BY LIGO AND VIRGO 
Here are the binary mergers that the observatories have picked up 
so far. Each discovery was named with the date it was detected. 
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gravitational waves for clues about how black holes rotate: those 
that pair up after forming independently should have randomly 
oriented spins, whereas those with a common origin should have 
spin axes that are parallel to each other and roughly perpendicular 
to the plane in which they orbit. 

Further observations could also provide insight into some of 
the fundamental questions about black-hole formation and stellar 
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evolution. Collecting many measurements of masses should 
reveal gaps — ranges in which few or no black holes exist, says 
Vicky Kalogera, a LIGO astrophysicist at Northwestern University 
in Evanston, Illinois. In particular, “there should be a paucity of 
black holes at the low-mass end’, she says, because relatively small 
supernovae tend to leave behind neutron stars, not black holes, 
as remnants. And at the high end — around 50 times the mass 
of the Sun — researchers expect to see another cut-off. In very 
large stars, pressures at the core are thought eventually to produce 
antimatter, causing an explosion so violent that the star simply 
disintegrates without leaving any remnants at all. These events, 
called pair-instability supernovae, have been theorized, but so 
far there has been scant observational evidence to back them up. 

Eventually, the black-hole detections will delineate a map of 
the Universe in the way galaxy surveys currently do, says Rainer 
Weiss, a physicist at the Massachusetts Institute of Technology in 
Cambridge who was the principal designer of LIGO. Once the 
numbers pile up, “we can actually begin to see the whole Uni- 
verse in black holes’, he says. “Every piece of astrophysics will get 
something out of that.” 

To ramp up these observations, LIGO and Virgo have plans to 
improve their sensitivity, which will reveal not only more events, 
but also more details about each merger. Among other things, 


“WE CAN ACTUALLY BEGIN 
TO SEE THE WHOLE 
UNIVERSE IN BLACK HOLES” 


physicists are eager to see the detailed ‘ringdown’ waves that a 
post-merger black hole emanates as it settles into a spherical 
shape — an observation that could potentially reveal cracks in 
the general theory of relativity. 

Having more observatories spread around the globe will 
also be crucial. KAGRA, a detector under construction deep 
underground in Japan, might start gathering data by late 2019. 
Its location — and, in particular, its orientation with respect to 
incoming waves — will complement LIGO’s and Virgos, and ena- 
ble researchers to nail down the polarization of the gravitational 
waves, which encodes information about the orientation of the 
orbital plane and the spin of the spiralling objects. And India is 
planning to build another observatory in the next decade, made 
in part with spare components from LIGO. 

An even bigger trove of discoveries could come from observ- 
ing neutron-star mergers. So far, researchers have announced 
only one such detection, called GW170817. That signal, seen last 
August, was almost certainly the most intensely studied event in 
astronomy’s history. And it solved a number of long-standing 
mysteries in one stroke, including the origin of gold and other 
heavy elements in the Universe’, as well as the cause of some 
y-ray bursts*. 

Further observations could allow scientists to explore the 
interiors of these objects. Neutron stars are thought to be as dense 
as matter can possibly be without collapsing into a black hole, but 
exactly how dense is anybody’s guess. No laboratory experiment 
can study those conditions, and there are dozens of proposals for 
what happens there. Some theories predict that quarks — the suba- 
tomic components that make up protons and neutrons — should 
break free from each other and roam about, perhaps in supercon- 
ducting, superfluid states. Others posit that heavier, ‘strange’ quarks 
form and become part of exotic cousins of the neutron. 

Pinning down the radii of neutron stars might allow physicists 
to evaluate the theories, because they predict different ‘equations 
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of state’ — formulae that link pressure, temperature and density 
of matter. Such equations determine to what extent matter can be 
compressed, and so how wide or narrow a neutron star will be for 
a given mass, and how massive such stars can get. 

The 100-second-long signal in August eventually became too 
high in pitch for LIGO and Virgo to detect, which prevented the 
observatories from seeing the two neutron stars’ final moments, 
when they should have deformed each other in ways that would 
have revealed their size and hardness, or resistance to compres- 
sion. Still, says B. S. Sathyaprakash, a LIGO theoretical physicist 
at the Pennsylvania State University in University Park, from that 
one event, “we can rule out equations of state that allow neutron- 
star sizes larger than 15 kilometres in radius” — a figure that is 
consistent with other measurements and favours ‘softer’ matter. 

Future detections — and detectors — will give much more 
detail. Sathyaprakash says that the Einstein Telescope, a possible 
next-generation observatory dreamed up by a team in Europe, 
could take physicists far beyond an upper limit. “We want to be 
able to pin down the radius to the level of 100 metres,’ he says — a 
precision that would be astounding, given that these objects are 
millions of light years away. 


SIREN CALLS 

Signals similar to GW170817, which was observed through both 
gravitational waves and light, could have dramatic implications 
for cosmology. Schutz calculated in 1985 that the frequency, or 
pitch, of waves from spiralling objects, together with the rate at 
which that pitch increases, reveals information about the objects’ 
collective mass’. That determines how strong their waves should 
be at the source. By measuring the strength of the waves that reach 
Earth — the amplitude of the signal actually picked up by inter- 
ferometers — one can then estimate the distance that the waves 
have travelled from the source. All other things being equal, a 
source that is twice as far, for example, will produce a signal half 
as strong. This type of signal has been dubbed a standard siren, 
in a nod to acommon method of gauging distances in cosmol- 
ogy: stars called standard candles have a well-known brightness, 
which allows researchers to work out their distance from Earth. 

By coupling the distance measurement of GW170817 with an 
estimate of how fast the galaxies in that region are receding from 
Earth, Schutz and his collaborators made a new and completely 
independent estimate of the Hubble constant — the Universe’s 
current rate of expansion (see ‘Cosmic signposts’). The result’, 
part of acrop of papers released by LIGO, Virgo and some 70 other 
astronomy teams on 16 October (see go.nature.com/2gbsgnq), 
“ushers in a new era for both cosmology and astrophysics’, says 
Wendy Freedman, an astronomer at the University of Chicago in 
Illinois who has made highly precise measurements of the Hub- 
ble constant, using time-honoured, but less-direct, techniques. 

Asa direct and independent measure of this constant, standard 
sirens could help to resolve a disagreement among cosmologists. 
State-of-the-art techniques, refined over nearly a century of work 
that started with Edwin Hubble himself, now give estimates that 
differ by a few per cent. This first standard-siren measurement 
does not resolve the tension: the expansion rate it predicts falls 
somewhere in the middle of the range and, because it is based 
on just one merger event, has a large error bar. But in the future, 
researchers expect standard sirens to nail down the Hubble con- 
stant with an error of less than 1%. So far, standard candles have 
done it with precisions of 2-3%. 

Standard sirens could become even more powerful tools with 
space-based interferometers such as the Laser Interferometer 
Space Antenna (LISA), a trio of probes that the European Space 
Agency, which is leading the mission, plans to launch in the 
2030s. LISA is designed to be sensitive to low-frequency waves 
that ground-based observatories cannot detect. This would 
give it access to more-massive systems, which radiate stronger 
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THE GRAVITATIONAL-WAVE 
SPECTRUM 


Much like electromagnetic waves, gravitational waves are emitted by many different objects over a wide 
range of frequencies. Terrestrial interferometers such as the Laser Interferometer Gravitational-Wave 
Observatory (LIGO) and Virgo are sensitive to only a subset of those frequencies, which limits their 
ability to ‘see’ certain cosmic phenomena. They won’t detect collisions of supermassive black holes 
found in the hearts of galaxies, for example. But space-based interferometers and other approaches for 
picking up gravitational waves could extend physicists’ reach. 
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COSMIC SIGNPOSTS 


Neutron-star mergers are new tools for measuring the Hubble constant — 
the current expansion rate of the Universe. 


Earth Galaxy 


The gravitational-wave signal Because the merger event also 
can be used to gauge the releases light, conventional 
distance from Earth to the telescopes can be used to help 
former neutron-star pair. pinpoint where it happened. 


Thtsim, Seeimetliel Redshifted light 
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how fast the galaxy and 
those around it are 
speeding away from Earth. 
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Expanding Universe 


The velocity and distance data — ideally from many such mergers — can 
be combined to calculate the Hubble constant, which relates distance and 
speed (galaxies twice as distant recede twice as fast). 


gravitational waves. In principle, LISA could pick up sirens from 
across the Universe and, with the help of conventional telescopes, 
measure not just the current rate of cosmic expansion, but also 
how that rate has evolved through the aeons. Thus, LISA could 
help to address cosmology’s biggest puzzle: the nature of dark 
energy, the as-yet-unidentified cosmic component that is driving 
the Universe's expansion to accelerate. 

Whereas ground-based interferometers detect events that are 
brief and far between, LISA is expected to hear a cacophony of 
signals as soon as it turns on, including a constant chorus of tight 
binary white dwarfs — the ubiquitous remnants of Sun-sized stars 
— in our own galaxy. “It’s as if we lived in a noisy forest, and we 
had to single out the sounds of individual birds,” says astrophysi- 
cist Monica Colpi of the University of Milan—Bicocca in Italy, who 
is part of a committee setting the mission’s science goals. 

Occasionally, LISA should see black-hole mergers such as the 
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ones LIGO does, but on a much grander scale. Most galaxies 
are thought to harbour a central supermassive black hole that 
weighs millions, or even billions, of solar masses. Over a scale of 
billions of years, galaxies might merge several times; eventually, 
their central black holes might merge, too. These events are not 
frequent for individual galaxies, but because there are trillions of 
galaxies in the observable Universe, a detectable merger should 
occur somewhere at least a few times per year. Scientists are also 
pursuing a separate way of detecting gravitational waves from 
pairs of these behemoths at earlier stages of their orbits. Using 
radio telescopes, they monitor pulsars inside the Milky Way and 
look for small variations in their signals, caused by the passage 
of gravitational waves through the galaxy. Today, there are three 
‘pulsar-timing arrays, in Australia, Europe and North America, 
and a fourth forming in China. 

Thanks to LISA’s planned sensitivity, and the strong signals 
produced by spiralling supermassive black holes, the observatory 
should be able to pick up gravitational waves from pairs of super- 
massive black holes months before they merge, and see the merger 
in enough detail to test general relativity with high precision. After 
years of operation, LISA could accumulate enough distant events 
for researchers to reconstruct the hierarchical formation of galax- 
ies — howsmall ones combined to form larger and larger ones — in 
the Universe's history. 

On the ground, too, physicists are beginning some “grand new 
ventures’, Weiss says. A US team envisions a Cosmic Explorer with 
40-kilometre detecting arms — 10 times as long as LIGO’s — that 
would be sensitive to signals from events much farther away, per- 
haps across the entire observable Universe. 

The concept for the Einstein Telescope in Europe calls for a 
detector with 10-kilometre arms arranged in an equilateral trian- 
gle and placed in tunnels 100 metres or so underground. The quiet 
conditions there could help to broaden the observatory’s reach, 
to frequencies one-tenth those detectable by current machines. 
That might allow scientists to find black holes beyond the range 
thought to be prohibited by pair-instability supernova; at high 
enough masses, stars should have a different collapse mechanism 
and be able to form black holes of 100 solar masses or more. 

If scientists are lucky, gravitational waves might even let them 
access the physics of the Big Bang itself, at epochs that are not 
observable by any other means. In the first instants of the Uni- 
verse, two fundamental forces — the electromagnetic force 
and the weak nuclear force — were indistinguishable. When 
these forces separated, they might have produced gravitational 
waves that, today, could show up as a “random hiss” detectable 
by LISA, Schutz says. This hypothetical signal is distinct from a 
much longer-wavelength one from even earlier on, which might 
appear in the Universe's oldest visible radiation: the cosmic micro- 
wave background. In 2014, a team reported® that it had observed 
this effect with the BICEP2 telescope at the South Pole, but the 
researchers later acknowledged problems with that interpreta- 
tion’, 

With the reopening of both LIGO and Virgo late this year, the 
next big discovery on Weiss’ wish list is the signal from a collaps- 
ing star — something that astronomers might also observe as a 
type of supernova. But he has high hopes for what else might be 
on the horizon. “If we don't see something that we hadn't thought 
of,’ Weiss says. “Id be disappointed” = 


Davide Castelvecchi is a senior reporter for Nature in London. 
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A safety driver sits behind the wheel during a test of a self-driving taxi in Yokohama, Japan. 


People must retain control 
of autonomous vehicles 


Legislation on the testing of self-driving cars does not address liability and safety 
concerns, warn Ashley Nunes, Bryan Reimer and Joseph F. Coughlin. 


ast month, for the first time, a 
Lees was killed in an accident 

involving a self-driving car. A sports- 
utility vehicle controlled by an autonomous 
algorithm hit a woman who was crossing 
the road in Tempe, Arizona. The safety 
driver inside the vehicle was unable to 
prevent the crash. 

Although such accidents are rare, their 
incidence could rise as more vehicles that are 
capable of driving without human interven- 
tion are tested on public roads. In the past 
year, several countries have passed laws to 


pave the way for such trials. For example, 
Singapore modified its Road Traffic Act to 
permit autonomous cars to drive in desig- 
nated areas. The Swedish Transport Agency 
allowed driverless buses to run in northern 
Stockholm. In the United States, the House of 
Representatives passed the SELF DRIVE Act 
to harmonize laws across various states. Simi- 
lar action is pending in the US Senate, where 
a vote to support the AV START Act would 
further liberalize trials of driverless vehicles. 

Policymakers are enthusiastic about 
the potential of autonomous vehicles to 


reduce road congestion, air pollution and 
road-traffic accidents'”. Cheap ride-hailing 
services could reduce the number of pri- 
vately owned cars. Machine intelligence can 
make driving more fuel-efficient, cutting 
emissions. Autonomous cars could help to 
save the 1.25 million lives worldwide that 
are lost each year through crashes’, many of 
which are caused by human error. 
Governments want to pass laws to make 
this happen (see ‘Road to autonomy’). 
But they are doing so by temporarily free- 
ing developers of self-driving cars from 
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> meeting certain transport safety rules. 
These rules include the requirement that a 
human operator be inside the vehicle, that 
vehicles have safety features such as a steer- 
ing wheel, brakes and a mirror, and that the 
features are functional at all times. Some 
developers are maintaining these aspects, 
but they are not obliged to do so. There is 
no guarantee that autonomous vehicles will 
match the safety standards of current cars. 

Meanwhile, the wider policy implications 
are not being addressed’*. Governments 
stand to lose billions of dollars in tax revenue 
as rates of car ownership drop among 
individuals. Millions of taxi, lorry and bus 
drivers will lose their jobs*. The machine- 
learning algorithms on which autonomous 
vehicles rely are far from developed enough 
to make choices that could mean life or death 
for pedestrians or drivers. 

Policymakers need to work more closely 
with academics and manufacturers to design 
appropriate regulations. This is extremely 
challenging because the research cuts across 
many disciplines. 

Here, we highlight two areas — liability 
and safety — that require urgent attention. 


LIABILITY 

Like other producers, developers of 
autonomous vehicles are legally liable for 
damages that stem from the defective design, 
manufacture and marketing of their products. 
The potential liability risk is great for driver- 
less cars because complex systems interact in 
ways that are unexpected. 

Manufacturers want to minimize the 
number of liability claims made against 
them*. One way is to reduce the chance of 
their product being misused by educating 
consumers about how it works and alerting 
them to safety concerns. For example, drug 
developers provide information on dosages 
and side effects; electronics manufacturers 


issue instructions and warnings. Such 
guidance shapes the expectations of con- 
sumers and fosters satisfaction. Yet, much 
like smartphones, self-driving cars are 
underpinned by sophisticated technologies 
that are hard to explain or understand. 

Instead, developers are designing such 
products to be easy to use’. People are more 
likely to buy a product that seems straight- 
forward and with which they can soon do 
complicated things, increasing its utility. 
However, users are then less able to anticipate 
how the underlying systems work, or to rec- 
ognize problems and fix them. For example, 
few drivers of computerized cars know how 
the engine is calibrated”. Similarly, a passen- 
ger in an autonomous vehicle will not know 
why it chooses to make a sharp turn into 
oncoming traffic or why it does not overtake 
a slow-moving vehicle. 

Worse, deep-learning algorithms are 
inherently unpredictable. They are built on 
an opaque decision-making process that is 
shaped by previous experiences. Each car 
will be trained differently. No one — not 
even an algorithm’s designer — can know 
precisely how an autonomous car will 
behave under every circumstance. 

No law specifies how much training is 
needed before a deep-learning car can be 
deemed safe, nor what that training should 
be. Cars from different manufacturers could 
react in contrasting ways in an emergency. 
One might swerve around an obstacle; 
another might slam on the brakes. Rare traf- 
fic events, such as a truck tipping over in the 
wind, are of particular concern and, at best, 
make it difficult to train driverless cars. 

Advanced interfaces are needed that 
inform users why an autonomous vehicle 
is behaving as it does. Today’s dashboards 
convey information about a car’s speed and 
the amount of fuel that remains. Tomorrow's 
displays must show the vehicle's ‘intentions’ 


Adriverless bus shuttles passengers across Southeast University’s Jiulonghu campus in Nanjing, China. 
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and the logic that governs them; for example, 
they might tell passengers that the car will 
not overtake the vehicle ahead because there 
is only a 10% likelihood of success. Little is 
known about the types of data that should be 
imparted and how people will interpret them. 
Users often ignore information, even if it is 
presented clearly and the consequences could 
be a matter of life or death. For instance, 
almost 70% of airline passengers do not 
review safety cards before a flight®, despite 
being asked. Yet these cards convey impor- 
tant information, including how to put onan 
oxygen mask and open an emergency exit, in 
simple terms and ona single page. 
Autonomous vehicles will need to 
communicate much more complicated infor- 
mation. Their sensors and algorithms must 
understand the behaviours of pedestrians, 
discriminate between styles of driving and 
adjust to changes in lighting. When they 
cannot, users must know how to respond. 
Researching ways to present this informa- 
tion effectively is paramount, as are legislative 
efforts to ensure that users of autonomous 
vehicles are proficient in using the technology. 


SAFETY 
The safety and efficiency benefits of 
autonomous cars rely on computers making 
better, quicker decisions than people. Users 
input their desired destination and thereafter 
cede control to the computer. Full autonomy 
has — deliberately — not yet been adopted 
in transportation. People are still perceived 
as being more flexible, adaptable and creative 
than machines, and better able to respond to 
changing or unforeseen conditions’. Pilots 
are able, therefore, to wrest control from fly- 
by-wire technology when key computers fail. 
The public is right to remain cautious 
about full automation. Manufacturers 
need to explain how a car would protect 
passengers should crucial systems fail. A 
driverless car must be able to stop safely if its 
hazard-avoidance algorithms malfunction, its 
cameras break or its internal maps die. But this 
is hard to engineer: for example, without cam- 
eras, such a car cannot see where it is going. 
In our view, some form of human 
intervention will always be required. 
Driverless cars should be treated much like 
aircraft, in which the involvement of people 
is required despite such systems being highly 
automated. Current testing of autonomous 
vehicles abides by this principle. Safety driv- 
ers are present, even though developers and 
regulators talk of full automation. 
Nonetheless, having people involved 
poses safety problems. Autonomous cars 
will always require users to have a minimum 
level of skill and will never be easy for some 
members of the public to operate. People 
with cognitive impairments, say, might 
find it difficult to operate these technolo- 
gies and to override controls. Yet this group 
includes those who would benefit greatly 
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ROAD TO AUTONOMY 


The Netherlands heads the list of countries that are most prepared for autonomous vehicles. 
Twenty nations were assessed according to four key areas of preparedness. 
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Singapore has the 
most supporting 
legislation and the 
whole city state is 
a test area. 


Japan has the 
most patents per 
person for 
driverless-vehicle 
technologies. 


India is concerned 


about job losses for 


India lorry and taxi drivers. 


from self-driving vehicles. For example, 
older adults*, a demographic of increasing 
importance, have an elevated risk of crashes 
because cognitive abilities decline with age”””. 
Providing mobility for large numbers of 
elderly people is an impetus for investment 
in this technology in Japan, for instance. 

A remote supervisor could oversee 
driverless cars as air-traffic controllers do for 
aircraft. But how many supervisors would 
be needed to keep networks of such vehicles 
safe? Stretching human capacity too far can 
create accidents!!. For example, in 1991, 
an overwhelmed air-traffic controller in 
Los Angeles, California, mistakenly cleared 
an aeroplane to land on another. Last year, an 
overload of patients was blamed for a string 
of medical errors by doctors in Hong Kong. 


POLICY GAPS 

Current and planned legislation fails to 
address these issues. Exempting developers 
from safety rules poses risks. And develop- 
ers are not always required to report system 
failures or to establish competency standards 


for vehicle operators. Such exemptions also 
presume, wrongly, that human involvement 
will ultimately be unnecessary. Favouring 
industry over users will erode support for the 
technology from an already sceptical public. 

Present legislation sidesteps the education 
of consumers. The US acts merely require 
that users are “informed” about the technol- 
ogy before its use. Standards of competency 
and regular proficiency testing for users 
are not mentioned. Without standards, it 
is hard to tell whether consumer education 
programmes are adequate. And without 
testing, the risk of incidents might increase. 


MOVING FORWARD 

We call on policymakers to rethink their 
approach to regulating autonomous vehi- 
cles and to consider the following six points 
when drafting legislation. 

Driverless does not, and should not, 
mean without a human operator. Regula- 
tors and manufacturers must acknowledge, 
rather, that automation changes the nature 
of the work that people perform’. 


Users need information on how 
autonomous systems are working. Manu- 
facturers must research the limits and 
reliability of devices that are crucial for safety, 
including cameras, lasers and radars. When 
possible, they should make the data from 
these devices available to vehicle operators in 
an understandable form. 

Operators must demonstrate compe- 
tence. Developers, researchers and regulators 
need to agree proficiency standards for users 
of autonomous vehicles. Competency should 
be tested by licensing authorities and should 
supplement existing driving permits. Users 
who fall short should have their access to such 
vehicles limited, just as colour-blind pilots are 
banned from flying at night. 

Regular checks on user competency 
should be mandatory. Regulators, manu- 
facturers and researchers must determine a 
suitable time interval between tests, so that 
proficiency is kept up as cognitive abilities 
change and technology evolves. 

Remote monitoring networks should 
be established. Manufacturers, researchers 
and legislators need to build supervisory sys- 
tems for autonomous vehicles. Researchers 
should supply guidance on the number of 
vehicles that one supervisor can monitor 
safely, and on the conditions under which 
such monitoring is permissible. For example, 
more supervisors would be needed in poor 
weather conditions. 

Work limits for remote supervisors 
should be defined. Experts must clarify 
whether supervisors should be subject to 
existing working-time regulations. For 
example, air-traffic controllers are limited 
in how long they can work. 

The path towards autonomy is far from 
preordained. Considerable challenges 
remain to be addressed. m 


Ashley Nunes, Bryan Reimer and 
Joseph F. Coughlin are in the AgeLab, 
Center for Transportation and Logistics, 
Massachusetts Institute of Technology, 
Cambridge, Massachusetts 02142, USA. 
e-mail: anunes@mit.edu 
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Contemplating the night sky in Maine became a metaphysical experience for physicist Alan Lightman. 


PHILOSOPHY OF SCIENCE 


A physicist faces the sublime 


Anil Ananthaswamy on Alan Lightman’s journeys in empiricism and experience. 


ith his debut novel, Einstein’s 
Dreams (1992) — the poetic 
musings of a Swiss patent clerk 
on the nature of time — theoretical physicist 
Alan Lightman revealed an enthusiasm for 
entering the human psyche. His latest book, 
the collection of essays Searching for Stars on 
an Island in Maine, goes further. Here, Light- 
man confronts the contradictions that arise 
from having a rigorous scientific world view, 
alongside his own mortal desires and fears. 
Lightman begins Searching for Stars 
with an account of a mystical experience. 
He's motoring through the coastal waters 
off mainland Maine, towards Pole Island, 
where he has a summer home. It’s a moon- 
less night. Before he docks, Lightman turns 
off the boat’s motor and running lights, and 
lies down in the silence and darkness. “After 
a few minutes, my world had dissolved into 
that star-littered sky. The boat disappeared. 
My body disappeared... I felt connected not 
only to the stars but to all of nature, and to 
the entire cosmos.” With that, Lightman 
begins an exploration of the tensions, both 


172 | NATURE | VOL 556 | 12 APRIL 2018 


within himself and 

without, between the 

materialist reduction- 

ism of science, espe- fecha 
cially physics, andthe [a 
absolutes of spiritual [an 
belief. Alan * 

As a physicist, he ‘chiang 
knows there are no 
absolutes. The idea of 
a fixed and motionless 
Earth was disproved 
in 1851 by the “slow 
rotation of the plane of 
a swinging pendulum” 
— physicist Léon Foucault's experiment — 
which could be explained only if the planet, 
not the pendulum, was rotating. Discoveries 
of the electron and radioactivity showed us 
that even atoms, once thought indestruct- 
ible, were anything but. Next, Albert Einstein 
demolished Newtonian notions of abso- 
lute space and time. Then came quantum 
mechanics, with its claims of uncertainty 
and indeterminism. 


Searching for Stars 
onan Island in 

Maine 
\LAN LIGHTMAN 


Pantheon: 2018 


For anyone looking to science for assur- 
ance, the bottom falls out. So, Lightman 
looks past it. “Iam a scientist, but Iam not a 
swinging bob ona string,’ he writes. 

He sets the stage for a dialogue, introduc- 
ing us to the usual suspects in science: Galileo 
Galilei, J. J. Thomson, Ernest Rutherford 
and Einstein; and to a handful of spiritual 
thinkers and religious figures. We meet, 
for instance, fleetingly, the Indian poet and 
Nobel laureate in literature Rabindranath 
Tagore; and, more substantially, Augustine of 
Hippo, the influential fifth-century Christian 
theologian. 

“Augustine's certainties were absolute,” 
Lightman writes, contrasting these immu- 
table religious ideas — suchas the immortal 
soul — with science’s ever-evolving view. Yet, 
he argues, science, too, longs for an absolute 
ina final ‘theory of everything, and has its 
article of faith: “that the physical world is a 
territory of order and logic”. 

Lightman’s scope is sweeping, but he 
doesn't dig deeply enough. For instance, he 
expresses disbelief in bardo — the Tibetan 
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Buddhist term signifying the transitory state 
between death and rebirth. He writes: “I ask 
for some kind of evidence for all things I 
believe — even if it is evidence from a per- 
sonal or transcendent experience. And I 
insist on evidence for any statements that 
concern the physical world” Certainly, there 
is no ‘evidence for bardo, independent of the 

subjective experi- 


“As a physicist, ences of Tibetan 
Lightmanknows Buddhists. But 
there are no Lightman stands 
absolutes.” by his own subjec- 


tive experience of 

perceived ‘oneness’ 
with something larger than ourselves. A rig- 
orous scientific approach would question the 
veracity of all subjective experiences, not just 
those that seem unreasonable at first blush. 

Therein lies the book’s Achilles heel: it 
makes little mention of the research on per- 
ception that calls into doubt the ‘truth’ of 
subjective experiences, no matter how real 
or exalted they feel. Modern neuroscience 
tells us that what we perceive is not a bottom- 
up reconstruction by the brain of what's out 
there. Rather, it is the brain’s prediction 
about the probable causes of sensory inputs. 

Predictions, and thus perceptions, can 
be wrong. This is of particular impor- 
tance when perceptions hint at something 
spiritual. It’s a subject explored, for exam- 
ple, in the 2015 Kabbalah: A Neurocognitive 
Approach to Mystical Experiences by neurol- 
ogist Shahar Arzy and scholar of Jewish 
thought Moshe Idel. Mystical experiences 
are also eerily similar to those reported by 
people having ecstatic epileptic seizures, 
including feelings of time dilation and ‘one- 
ness, the neural underpinnings of which are 
under study (M. Gschwind and F. Picard 
Front. Behav. Neurosci. 10, 21; 2016). 

The book’s narrative structure — set 
up as the articulate reveries of a physicist, 
who is alternately naturalist, stargazer 
and philosopher, wandering around his 
island, constantly thinking grand thoughts 
on mossy slopes — risks becoming self- 
parody. Lightman saves the day somewhat 
by acknowledging the indulgence. 

However, as a broad take on intellectual 
thought at the intersection of science and 
spirituality, Searching for Stars is stimulating. 
Lightman is to be admired for his willing- 
ness to take off his scientist’s hat and plunge 
into preoccupations most of his peers would 
strenuously avoid, some for fear of ridicule. 
Once again, this deft wordsmith has effort- 
lessly straddled the divide between the hard- 
est of the hard sciences and the nebulous 
world of existential doubts and longings. = 


Anil Ananthaswamy is a journalist and 
author of The Man Who Wasnt There, an 
exploration of the neuroscience of the sense 
of self. 


e-mail: anil@nasw.org 
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Books in brief 


Burning Planet 

Andrew C. Scott OXFORD UNIVERSITY PRESS (2018) 

Megafires regularly crackle through the headlines, yet wildfire 
management remains largely misguided. Geologist Andrew 

Scott redresses the balance in this scholarly yet accessible study, 
drawing on ground and satellite observation as well as his original 
research into the 400-million-year history of fire on Earth. Through 
technologies such as scanning electron microscopy, Scott’s study 
of fossil charcoal has unearthed an astounding deep-past record 
of botanical riches and shifts in climate and oxygen levels. A timely 
book in an era of heightened fire risk and threats to water supply. 
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Rainforest: Dispatches from Earth’s Most Vital Frontlines 

Tony Juniper PROFILE (2018) 

The “green oceans” that are tropical rainforests help to regulate Earth’s 
water, climate and carbon cycles; support 50% of terrestrial flora 

and fauna; and offer a lifeline to 1.6 billion people. Yet half have been 
cleared, in large part by consumer-led interests, from cattle ranching to 
palm-oil production. Environmentalist Tony Juniper surveys the terrain 
through myriad lenses: the bitter history of exploitation and its impact 
on indigenous peoples; the stupendous biological riches; and the 
conservation science and community involvement that, given political 
and industrial will, could halt the felling. 


The Efficiency Paradox: What Big Data Can’t Do 

Edward Tenner KNoPF (2018) 

We pursue efficiency through engulfment in the digital. Yet, argues 
historian of technology Edward Tenner in this perceptive study, 
the promise of big data and algorithms for information, education, 
medicine and beyond is dissipating. The Silicon Valley dream of a 
frictionless existence is failing because ethical, political and social 
elements were factored in poorly, spawning issues such as flawed 
algorithms. Sympathetically critiquing the work of others in this 
arena, including Nicholas Carr and Cathy O’Neill, Tenner calls for a 
strategy that blends intuition and experience with high technology. 


Audubon’s Last Wilderness Journey 

Marilyn Laufer et al. GILES (2018) 

Forget birds: otters, bison, armadillos, black bears, elk, beavers and 
other New World mammals starred in ornithologist John James 
Audubon’s last great work of natural-history illustration. Published in 
three volumes between 1845 and 1848, and inspired by Audubon’s 
1843 journey up the Missouri River, the original featured 150 hand- 
coloured illustrations. Curators at the Jule Collins Smith Museum 

of Fine Art at Alabama’s Auburn University have now made them 
available to all. Accompanying the striking reproductions are fresh 
essays on hunting, conservation, wilderness, mammalogy and more. 


sclENCE 
NOT 


Science Not Silence 

Edited by Stephanie Fine Sasse and Lucky Tran MIT PRESS (2018) 
More than one million researchers, postdocs and science aficionados 
took to the streets across some 600 cities on 22 April 2017. The 
March for Science was a riposte to the US administration’s ennui 
around research; it aimed to reify the fundamental, multidimensional 
importance of science in tackling global challenges and advancing 
knowledge. This vibrant photo-essay compilation, edited by science 
communicators Stephanie Fine Sasse and Lucky Tran, pays homage 
to the international community and its resilience, creativity and 
ongoing commitment to speaking truth to power. Barbara Kiser 
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Correspondence 


Don’t dismiss non- 
English citations 


We find it inexcusable for peer 
reviewers to dismiss citations 
to scientific papers that are not 
published in English. Journals 
written in other languages are 
a valuable repository for much 
locally relevant applied science 
(see, for example, M. Neff Nature 
554, 169; 2018). And in most 
countries today, these works 
are accessible through free, 
automated translation services. 

We experienced such 
discrimination after submitting 
a paper to an English-language 
journal. It was a bibliometric 
evaluation of research activities 
at universities in Belarus and 
Ukraine, so some citations 
were inevitably in Russian. One 
reviewer complained that this 
“precludes ... checking that source 
to determine if it does actually 
support the authors’ statements’. 
Another demanded more 
information in the text about 
the work of an internationally 
recognized bibliometrician, Irina 
Marshakova-Shaikevich, “since 
she writes in Russian”. 

In our view, substituting 
non-English citations with 
anglophone alternatives risks 
transposing credit for ideas 
and violates citation standards. 
Papers should be evaluated 
on academic criteria, not 
on superficial grounds of 
communication. 

Vladimir S. Lazarev Belarusian 
National Technical University, 
Minsk, Belarus. 

Serhii A. Nazarovets Kiev 
National University of Culture 
and Arts, Kiev, Ukraine. 
vslazarev@bntu. by 


Building rapport for 
better policymaking 


Marie Claire Brisbois and 
colleagues advise scientists to 
interact with government policy 
analysts to improve evidence- 
based policy (see Nature 555, 
165; 2018). In our experience, the 
interaction between academia 
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and policymakers needs to be a 
two-way process. 

We are members of the Centre 
for the Evaluation of Complexity 
across the Nexus, a consortium 
of academics and practitioners 
who work with UK government 
departments and agencies to 
improve policy evaluation and 
design across the water-energy- 
food-environment nexus 
(www.cecan.ac.uk). We test and 
promote innovative methods and 
approaches through co-designed 
and co-produced case studies that 
span, for instance, rural policy 
after Brexit, energy security and 
food-safety regulation. 

Progress in these complex 
policy areas depends on sharing 
knowledge and building trust 
and capacity with civil servants 
across the political spectrum. 
Adam P. Hejnowicz, Sue E. 
Hartley University of York, UK. 
Nigel Gilbert University of 
Surrey, Guildford, UK. 
adam. hejnowicz@york.ac.uk 


World Heritage Site 
fish faces extinction 


The North Sea houting 
(Coregonus oxyrinchus) is a 
whitefish that is endemic to the 
Wadden Sea, an area including 
the North Sea coasts of the 
Netherlands, Germany and 
Denmark. A critically small 
population in Denmark's Vidaa 
River, estimated at 3,500 adult 
individuals in 2014, is the last 
remaining worldwide. We call on 
the Danish authorities to prevent 
further decline of this fish 
through informed conservation 
planning and management 
before it is too late. 

The Wadden Sea is a World 
Heritage Site that harbours the 
world’s largest unbroken system of 
intertidal sand and mud flats. The 
North Sea houting is protected 
under the Bern Convention and 
the EU Habitats Directive. Yet 
Denmark’s conservation efforts 
since 1992 have been limited to 
population estimates, insufficient 
regulation of the predatory 
great cormorant (Phalacrocorax 
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carbo) and unsuccessful 
habitat-restoration projects. 

The habitats needed by this 
fish for spawning and juvenile 
development are still unclear, 
so it is not possible to protect 
or restore them. This basic 
knowledge is essential for future 
restoration projects. We urgently 
need to understand why the 
population is still in decline and 
to put effective conservation 
measures in place. The North 
Sea houting must not end up 
next to the great auk (Pinguinus 
impennis) on museum shelves. 
Jon C. Svendsen Technical 
University of Denmark, Kongens 
Lyngby, Denmark. 

Aage K. O. Alstrup Aarhus 
University, Aarhus, Denmark. 
Lasse F. Jensen Aalborg 
University, Aalborg, Denmark. 
jos@aqua.dtu.dk 


Pesticide policies 
need holistic view 


New pesticide policies are 
needed for more sustainable 
agricultural production, but 
their wider implications need 
to be considered. Efforts to ban 
ubiquitous pesticides such as 
glyphosate and neonicotinoids 
are ongoing (see, for example, 
Nature 555, 150-151; 2018). 

In Switzerland, proposals have 
been made to suspend subsidies 
for farms that use pesticides and 
to ban all synthetic pesticides. In 
Italy, the municipality Mals has 
banned pesticide use by farmers. 
Furthermore, private industries 
are increasingly restricting 
pesticides and have introduced 
labels for glyphosate-free 
products. 

Stricter policies can have 
unintended effects, however. They 
may encourage changes in land 
use and management practices 
that decrease food production 
and quality, or increase soil 
erosion and greenhouse-gas 
emissions. Banned pesticides 
might even be substituted with 
more harmful ones. 

Technologies such as sensors, 
drones and robots could help to 
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monitor and control pesticide 
application (see A. Walter 

etal. Proc. Natl Acad. Sci. 

USA 114, 6148-6150; 2017). 
Pesticide taxation is another, 
complementary possibility 

(R. Finger et al. Ecol. Econ. 134, 
263-266; 2017). 

To avoid misguided policies, 
trade-offs between different 
policy goals need to be quantified 
for a holistic assessment. For 
example, modelling approaches 
could assess the impact of 
more-stringent pesticide policies 
on plant protection and land 
use and quantify the economic 
consequences (T. Bécker et al. 
Ecol. Econ. 145, 182-193; 2018). 
Robert Finger ETH Zurich, 
Switzerland. 
rofinger@ethz.ch 


Singapore Index for 
climate change 


The Singapore Index of Cities’ 
Biodiversity was set up ten 
years ago by the National Parks 
Board of Singapore and the 
United Nations Secretariat of 
the Convention on Biological 
Diversity as urban development 
boomed. I suggest that this self- 
assessment tool could also be 
applied to safeguard biodiversity 
against the effects of climate 
change on cities (see X. Bai et al. 
Nature 555, 23-25; 2018). 

The index consolidates 
important biodiversity indicators 
to help cities to evaluate and 
benchmark their conservation 
efforts (see go.nature. 
com/2hammaa). The National 
Parks Board of Singapore (see 
www.nparks.gov.sg) received 
the 2017 UNESCO Sultan 
Qaboos Prize for Environmental 
Preservation, and the board's 
experience could benefit cities 
across the globe. 

Singapore should continue 
to apply the index to manage 
biodiversity in the face of climate 
change, for example in heat islands 
or in areas prone to flooding. 
Sameen Ahmed Khan Dhofar 
University, Salalah, Oman. 
rohelakhan@yahoo.com 
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QUANTUM PHYSICS 


The certainty of randomness 


Communication systems rely on random-number generators for the encryption of information. A method for producing 
truly random numbers even from untrustworthy devices could lead to improvements in security. SEE LETTER P.223 


STEFANO PIRONIO 


ncryption schemes used in modern 
Hereeesen: make extensive use of 

random, unpredictable numbers to 
ensure that an adversary cannot decipher 
encrypted data or messages. Reliable random- 
number generators are therefore crucial. For 
instance, an Internet-wide analysis identified 
tens of thousands of servers that are vulner- 
able to basic attacks because of the use of 
poor-quality random-number generators’. 
On page 223, Bierhorst et al.’ exploit effects at 
the crossroads of quantum physics and special 
relativity to demonstrate the ultimate random- 
number generator, achieving unprecedented 
security. 

Although schemes to generate random- 
looking numbers are easy to come up with, 
assessing their security — the extent to which 
they are truly unpredictable to a potential 
adversary — is notoriously difficult. Much 
of the trouble stems from the fact that such 
schemes cannot be tested by merely looking at 
their output from a black-box perspective: that 
is, a perspective from which the internal work- 
ings are unknown. For instance, certain arith- 
metic operations known as pseudorandom 
number generators produce sequences of 
numbers that are completely predictable. How- 
ever, these sequences do not have any recog- 
nizable patterns and thus, from the perspective 
of someone who does not know how the num- 
bers have been generated, they cannot easily 
be distinguished from sequences obtained by 
truly random methods. 

It would therefore seem that security can be 
established only if the random-number gen- 
erator satisfies two conditions. First, the user 
must know how the numbers have been gen- 
erated to verify that a valid procedure is being 
implemented. And second, the system must be 
a black box from the adversary’s perspective 
to prevent them from exploiting knowledge 
about its internal mechanism. 

However, the first condition is unrealistic. 
A random-number generator can deviate 
from its intended design because of imperfec- 
tions, component ageing, accidental failures 
or explicit tampering by an adversary, lead- 
ing to undetected biases. And monitoring the 
internal mechanism of a random-number 
generator in real time is both impractical and 
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Figure 1 | A quantum random-number generator. Bierhorst et al.* report an experiment that produces 
strings of truly random bits (0s and 1s), which are desirable for improving the security of a wide range 

of communication systems. The authors prepared a pair of photons (blue and red) that were entangled, 
meaning that their properties were strongly correlated. They then sent each photon to a different remote 
measurement station, where the photons’ polarizations were recorded. The measurement outcomes from 
the two stations were unpredictable, thanks to the strong correlated behaviour and large separation of the 
photons. However, the randomness was small, even after millions of runs. The authors used a powerful 
post-processing technique to generate truly random bits from these measurements, with minimal 


physical assumptions about the photons’ behaviour. 


difficult®. Moreover, the second condition 
violates Kerckhoffs’s principle — a central 
tenet of modern cryptography that was 
reformulated by the father of information 
theory, Claude Shannon‘, as “the enemy knows 
the system being used”. In other words, crypto- 
graphic systems should be designed under the 
assumption that an adversary will quickly gain 
familiarity with them. 

Remarkably, thanks to the unusual laws 
of quantum physics, it is possible to create a 
provably secure random-number generator 
for which the user has no knowledge about the 
internal generation mechanism, whereas the 
adversary has a fully detailed description of it. 

To understand how this works, consider 
the experiment carried out by Bierhorst and 
colleagues (Fig. 1). The authors prepared two 
photons in a peculiar quantum condition 
known as an entangled state. They then sent 
each photon to a different remote measure- 
ment station, where the photons’ polariza- 
tions were recorded. During measurement, 
the photons were unable to interact with each 
other — the stations were so distant that this 
would require signals travelling faster than the 
speed of light. Nevertheless, the measurement 


outcomes were strongly correlated because of 
the photons’ entangled nature. Such correla- 
tions can be detected experimentally through 
statistical criteria known as violations of Bell 
inequalities’. 

The strong correlated behaviour of the two 
remote photons suggests that they could be 
used to devise a faster-than-light communi- 
cation device. This would indeed be possible 
unless the photons’ measurement outcomes 
were unpredictable, in which case any attempt 
to use such photons in a communication 
device would fail, because it would result in 
scrambled, indecipherable messages. Because 
faster-than-light communication is impossi- 
ble, it follows that violations of Bell inequalities 
imply random measurement outcomes. That 
is, the violations provide an experimental 
signature of randomness. 

This conclusion depends only on the impos- 
sibility of faster-than-light signalling and not 
on any detailed description of the associated 
quantum systems. It must therefore be true 
from an adversary’s perspective, regardless 
of their particular knowledge of the quantum 
processes being carried out. And because vio- 
lations of Bell inequalities can be verified by 
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a user only from the statistics of the observed 
outputs of such processes, the verification 
procedure represents a black-box test of 
randomness. 

Violations of Bell inequalities have been 
observed in numerous experiments over the 
past three decades’, and their qualitative con- 
nection to randomness has been known for 
many years. However, quantum-information 
researchers have started to develop the tools 
to exploit this connection only in the past 
few years’, 

A key difficulty has been that most experi- 
ments that violate Bell inequalities are affected 
by loopholes, meaning that they cannot be 
considered as black-box demonstrations. For 
instance, the constraint that the two photons 
cannot exchange signals at subluminal speeds 
was not strictly enforced in the two previous 
demonstrations of randomness generation 
based on Bell inequalities”*. In the past few 
years, loophole-free experiments have been 
carried out”, but they remain a technologi- 
cal challenge. In particular, the magnitude of 
the Bell-inequality violations observed in these 
experiments, although sufficient to confirm 
the correlated behaviour of the photons, was 
too low to verify the presence of randomness of 
sufficient quality for cryptographic purposes. 

Bierhorst and co-workers have improved 
existing loophole-free experimental set-ups 
to the point at which the realization of such 
randomness becomes possible. However, this 
threshold is barely reached. Every time a pho- 
ton is measured in the authors’ experiment, 
the randomness that is generated (expressed 
as bits; 0s and 1s) is equivalent to tossing a 
coin that has 99.98% probability oflanding on 
heads. 

Over many runs, the sequence of measure- 
ment outcomes should have accumulated 
enough uncertainty that truly random bits 
could be extracted through clever post- 
processing. However, no existing methods for 
analysing such sequences would have been 
efficient enough to reach this goal. Bierhorst 
et al. therefore introduced a powerful statistical 
technique, tailored to the weak Bell-inequality 
violations they observed, that achieved this 
aim. Ultimately, the authors were able to gen- 
erate 1,024 random bits in about 10 minutes 
of data acquisition — corresponding to the 
measurement of 55 million photon pairs. 

Bierhorst and co-workers’ random-number 
generator represents the most meticulous 
and secure method for producing random- 
ness that has ever been demonstrated. How- 
ever, its generation rate is much lower than 
in more-conventional commercial quantum 
random-number generators, which can pro- 
duce millions of random bits per second”. 
Nevertheless, improvements in the generation 
rate can reasonably be expected to the point at 
which this will no longer be a strong limiting 
factor. 

More problematic is the size of the authors’ 
random-number generator: it is comprised 


of measurement stations that are 187 metres 
apart to prevent subluminal signalling between 
the photon pairs. This distance might be 
reduced in the future, but it is hard to imagine 
how it could reach the dimensions of more- 
standard electronic hardware (at most, a few 
centimetres) using foreseeable technology. 

Although Bierhorst and colleagues’ study 
will therefore not directly lead to practical, 
consumer-grade random-number generators, 
it sets a new direction and ideal for the secure 
production of random bits. The authors’ 
approach and theoretical methods could be 
adapted to much more practical and simple 
designs for random-number generators that 
potentially retain many of the conceptual and 
security benefits of their work. = 
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Mirrors made of 
a single atomic layer 


Researchers have demonstrated that atomically thin materials can be highly 


reflective, contrary to general thinking. 


This finding could have technological 


implications for nanophotonics, optoelectronics and quantum optics. 


KIN FAI MAK & JIE SHAN 


he discovery of a single layer of carbon 

atoms, known as graphene’, led to 

great interest in 2D materials. Whereas 
graphene is highly transparent to visible 
light’, 2D materials that are highly reflective 
could be used as lightweight mirrors in opti- 
cal or optoelectronic systems. The existence 
of such materials has been questioned, but, 
writing in Physical Review Letters, Back et al.° 
and Scuri et al.* report that single layers of 
molybdenum diselenide can have high levels 
of reflectance. 

The importance of the authors’ work can 
be understood by considering the reflection 
of light from a homogeneous, free-standing 
thin film of material. When a wave of light 
of a particular colour — or, equivalently, 
frequency — hits the film, the oscillating 
electric field that is associated with the light 
wiggles the charged particles in the material. 
This drives the oscillation of electric dipoles 
(separations between positively and negatively 
charged particles) at the same frequency as 
that of the incident light (Fig. 1a). 

The oscillating dipoles re-radiate light waves 
in both the forward and backward directions 
with respect to the direction of the incident 
wave. Whereas the latter occurrence gives 
rise to reflection, the former destructively 


interferes with the incident wave, producing 
transmitted light that has a lower intensity 
than that of the incident light. The material's 
response to an oscillating electric field is, in 
general, not uniform with respect to incident 
waves from across the electromagnetic spec- 
trum. At a particular frequency, the dipoles 
have a large oscillation amplitude — a phe- 
nomenon known as resonance — which 
results in more reflection and less transmission 
of light than at any other frequency. 

Like all oscillators in real physical systems, 
the oscillations of the dipoles are damped, 
which means that they die out if the event that 
drives them is stopped. There are two ways in 
which the energy that is stored in the dipoles 
can be lost: it can be re-radiated (as discussed 
previously) or it can be absorbed by the mater- 
ial and converted into heat. These processes 
are known as radiative and non-radiative 
damping, respectively. In most materials, both 
mechanisms of damping operate. The inci- 
dent light is therefore partly reflected, partly 
absorbed and partly transmitted. 

However, in a material in which radiative 
damping dominates, the absorption losses 
would be negligible, and all of the incident 
electromagnetic energy would be re-radiated. 
Furthermore, the re-radiation in the 
forward direction would perfectly cancel 
out the incident light, through destructive 
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Figure 1 | A conventional material versus a perfect mirror. a, When a wave of light hits a thin film of 
an ordinary material, it produces electric dipoles — separations between positively charged (orange) and 
negatively charged (blue) particles. These dipoles oscillate (red arrow) at the same frequency as that of the 
incident light. They re-radiate light waves in both the forward and backward directions, which gives rise 
to transmission and reflection, respectively. b, By contrast, in a hypothetical perfect mirror, there is no 
absorption or transmission, and the incident light is reflected entirely. Back et al.’ and Scuri et al.’ report 
near-perfect mirrors made of a single layer of the material molybdenum diselenide. 


interference. Owing to conservation of 
energy, the incident light would be reflected 
entirely, and the material would act as a perfect 
mirror (Fig. 1b). This holds true even when 
the material comprises a single layer of atoms, 
provided that the oscillating dipoles are being 
driven at their resonance frequency. 

Although theoretical studies have 
suggested that such conditions could be 
realized in a 2D array of ultracold atoms”®, 
the authors demonstrate near-perfect mirrors 
in a solid-state system. They use a single layer 
of molybdenum diselenide, which is a semi- 
conductor and belongs to a family of materials 
knownas the transition-metal dichalcogenides. 
In such materials, the oscillating dipoles gener- 
ated by the incident light are excitons’ — bound 
pairs of an electron and a hole (the absence 
of an electron). The more tightly bound the 
excitons are, the larger the radiative damp- 
ing will be, and the more perfectly the mirror 
will behave. Previous experimental work has 
shown that the exciton binding in single-layer 
transition-metal dichalcogenides is extremely 
strong*”°, which results in a rate of radiative 
damping that is much greater than that of con- 
ventional semiconductors. 

Back et al. and Scuri et al. fabricated high- 
quality samples of single-layered molybdenum 
diselenide by encapsulating the material in 
atomically thin, inert films of hexagonal boron 
nitride, and then carried out their experi- 
ments at a low temperature (4 kelvin). Under 
these conditions, the authors show that radia- 
tive damping of the excitons is the dominant 
process. They demonstrate mirrors that can 
reflect a considerable proportion of the inci- 
dent light — up to 85% in Scuri and colleagues’ 
study — at the exciton resonance frequency of 
the material. 

Although the authors’ near-perfect mirrors 
work only in light from a narrow range of the 
electromagnetic spectrum (in the vicinity 
of the resonance frequency), the two studies 


178 | NATURE | VOL 556 | 12 APRIL 2018 


open up intriguing possibilities for the fields 
of nanophotonics and quantum optics. For 
instance, quantum nonlinear optics requires 
strong interactions between photons at the 
single-photon level, which is difficult to achieve 
in conventional materials. The authors’ work 
shows that quantum nonlinear optics could 
be realized in single-layer transition-metal 
dichalcogenides because of the extremely 
strong light-matter interactions that can be 
achieved”. 

The authors also demonstrate that the 
application of a voltage causes the mirrors 
to switch from being highly reflective to 
highly transparent. Such mirrors could there- 
fore be used as light modulators, or as other 
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reconfigurable components, in optical and 
optoelectronic systems. Moreover, the excitons 
in single-layer transition-metal dichalcogenides 
have a feature known as the internal-valley 
degree of freedom’, which might enable the 
mirrors’ reflectance to be controlled purely by 
varying the polarization of the incident light. 

About a decade ago, during the early stages 
of research on 2D materials, many scientists 
were asking whether a single layer of atoms 
could be highly reflective. Thanks to Back et al. 
and Scuri et al., we now know that the answer 
is yes. 
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Peptide signal alerts 
plants to drought 


It is thought that plants sense water availability in the soil as a way of anticipating 
drought. The identification of a peptide expressed when water is scarce offers a 
chance to unravel the underlying molecular mechanism. SEE LETTER P.235 


ALEXANDER CHRISTMANN & ERWIN GRILL 


ecause plants cannot move to escape 
Bontsctsti conditions, they must 

continuously monitor environmental 
cues to survive when conditions change. Plants 
can sense interactions with other organisms, 
such as bacteria, and can monitor light con- 
ditions across the spectrum, from ultraviolet 
to far red. The molecular mechanisms that 
facilitate those capacities are well understood. 
But how plants sense drought, cold and salt has 


remained an enigma’. On page 235, Takahashi 
etal.’ report the identification of a peptide that 
is generated in response to a water deficit in 
plants. 

Drought, cold or salty conditions can affect 
a plant’s water status. Such conditions result 
in the synthesis’ of the hormone abscisic acid 
(ABA), which can regulate the plant’s water 
levels. Stomatal pores in leaves enable plants 
to take up the carbon dioxide required for 
photosynthesis, but water vapour can escape 
through them. ABA can trigger a reduction in 
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how fully stomatal pores are opened’, helping 
to conserve water. 

The molecular basis of the link between 
water deficit and the induction of ABA 
synthesis has been a mystery. Using the plant 
Arabidopsis thaliana as a model system, 
Takahashi and colleagues investigated whether 
members of the CLE family of secreted pep- 
tides might have a role in this process. There 
are more than 30 members of this family, and 
they are generated by an enzyme-mediated 
cleavage event. These peptides are involved 
in diverse biological processes’. For example, 
CLAVATA3 controls the fate of stem cells, and 
TDIF regulates the formation of the vascula- 
ture, the water-transport tissues of plants®. 

Takahashi et al. tested 27 CLE peptides for 
their ability to stimulate ABA synthesis, which is 
known’ to occur in the vasculature in response 
to drought. Enzymes called NCEDs cleave a 
carotenoid precursor molecule in the pathway 
that gives rise to ABA, and the expression of 
the gene NCED3 is induced by drought’. The 
authors administered CLE peptides to the roots 
of plants, and monitored whether this treatment 
induced NCED3 in leaves. They found that, at 
low levels of peptide application, only CLE25 
was active in regulating NCED3 expression. 
CLE25 treatment resulted in an increase in 
ABA levels and a decrease in stomatal opening. 
The CLE25 gene was rapidly expressed in roots 
in response to drought, and CLE25-deficient 
mutant plants failed to induce NCED3 expres- 
sion in response to dehydration. The formation 
of CLE25 in the root or shoot was enough to 
induce NCED3 in response to dehydration. 

The authors tested groups of receptors 
known to recognize CLE peptides, and identi- 
fied the receptor proteins BAM1 and BAM3 as 
being necessary for CLE25-induced responses. 
A series of grafting experiments clarified 
how this system works. If roots containing 
mutations in both the BAM] and BAM3 genes 
were grafted to wild-type shoots, the applica- 
tion of CLE25 to the plant’s roots led to NCED3 
expression in the shoot. However, ifthe plant 
was a graft between wild-type roots and shoots 
that had mutations in both the BAM1 and 
BAM3 genes, NCED3 was not expressed in the 
shoot in response to root application of CLE25. 

These results are consistent with a model in 
which CLE25 expressed in the roots can travel 
to the leaves and bind to BAM1 or BAM3 
(Fig. 1). The authors confirmed this pattern of 
CLE25 mobility by using a mass-spectrometry 
technique to identify CLE25 peptides that 
had travelled from the root to the leaf. Little is 
known about how CLE peptides travel within 
the plant, and not all such peptides travel as far 
as CLE25: CLAVATA3 moves only a few layers 
of cells*, for example. 

Takahashi and colleagues’ findings open up 
potential avenues for determining the long- 
sought molecular events that occur when 
a water deficit is initially sensed. The steps 
leading to CLE25 expression in response to 
dehydration are unknown, and their discovery 
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Figure 1 | A peptide aids a plant’s response to drought. In drought’, plants generate the hormone ABA, 
which can help to regulate plant water levels using processes such as the closure of stomatal pores’, 
through which water escapes from leaves. However, the steps that occur between a plant sensing drought 
and the production of ABA were previously unknown. Using the plant Arabidopsis thaliana, Takahashi 

et al.’ report that the peptide CLE25 is activated in response to drought and is a mobile signal, moving 
from the roots to the leaves. The authors propose BAM1 and BAM3 proteins as receptors for CLE25, and 
their results indicate that interactions of CLE25 with these receptors leads to expression of the carotenoid- 
cleaving enzyme NCED3. The action of this enzyme generates an ABA precursor molecule’, which is 
converted to the active ABA signal, facilitating changes that help the plant to cope with a water shortage. 


would shed light on this matter. And many 
questions remain about how CLE25 levels are 
regulated. How does the presumed cleavage 
of CLE25 occur? Chemical modifications to 
CLE25, including the hydroxylation of pro- 
line amino-acid residues and possibly the 
addition of sugar groups, might be crucial 
for its activity’. Whether such modifications 
are necessary for the function of CLE25 
in the drought-sensing process should be 
investigated. 

The molecular mechanism of CLE25 action 
might be evolutionarily conserved in other 
plants. The results of Takahashi and colleagues 
suggest that the CLE25 peptide is generated by 
the enzymatic cleavage of a precursor protein 
that generates a 12-amino-acid peptide. In 
our own analysis of gene sequences, we note 
that the sequence of this CLE25 peptide in 
A. thaliana is identical to that of many other 
species, including beet (Beta vulgaris), poplar 
(Populus trichocarpa), rice (Oryza sativa) and 
maize (corn; Zea mays). 

Previous analysis’ revealed that water deficit 
can result in tension in the vasculature that can 
serve as a signal for ABA induction. Transport 
of CLE25 from the roots to the leaves is likely 
to be much slower than the immediate relay 
of this tension cue. Whether this cue and 
CLE25 act together or independently needs 
to be addressed. BAM1 and BAM are linked 
to the maintenance of meristem structures, 
which contain stem cells, and to vasculature 
development”. Whether these receptors use 
the same signalling pathways for those devel- 
opmental processes as the ones used in this 


drought response also awaits further analysis. 

The authors’ identification of this role for 
CLE25 provides an intriguing insight into the 
regulatory interaction network that plants use 
to optimize their performance and viability 
under drought conditions. Water deficit is 
the major limiting factor for crop yields, and 
an improved understanding of the molecu- 
lar strategies used by plants to cope with this 
environmental challenge" might reveal ways 
of boosting crop resilience and ensuring sta- 
bility in the future. m 
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North Atlantic 
circulation slows down 


Evidence suggests that the circulation system of the North Atlantic Ocean is ina 
weakened state that is unprecedented in the past 1,600 years, but questions remain 
as to when exactly the decline commenced. SEE ARTICLE P.191 & LETTER P.227 


SUMMER K. PRAETORIUS 


he warm, salty waters of the Gulf Stream 
| make a northeasterly meander across 
the Atlantic Ocean, eventually form- 
ing the North Atlantic Current, which then 
funnels into the Nordic Seas. In the chill of 
winter, these waters cool and descend with the 
heavy load of their salinity. This deep convec- 
tion is a key part of the Atlantic meridional 
overturning circulation (AMOC; Fig. 1), 
which can be thought of as an ocean conveyor 
belt that releases heat to the atmos- 
phere above the North Atlantic Ocean 
before travelling through the abyssal 
ocean to resurface in other areas of 
the world’. 

Given the importance of the AMOC 
to heat exchange between the ocean 
and the atmosphere, the varying 
strength of this system is thought to 
have major impacts on the global cli- 
mate, and has been implicated widely 
in some of the most remarkable and 
abrupt climate changes of the past”. 
Direct measurements of the mod- 
ern AMOC flow rates show a decline 
in its strength in the past decade’. 
Reconstructions of the natural vari- 
ability and long-term trends of the 
AMOC are needed, however, to put 
these recent changes in context. In 
this issue, Caesar et al.’ (page 191) and 
Thornalley et al.” (page 227) report 
on past AMOC variability using dif- 
ferent approaches. Both conclude that 
the modern AMOC is in an unusually 
subdued state, but they diverge in the 
details of how and when the AMOC’s 
decline commenced. 

Caesar and colleagues inferred 
changes in the strength of the AMOC 
in the past century from patterns of 
anomalies in sea surface temperature 
(SST) that arise in the North Atlantic 
when the AMOC weakens. The weak- 
ening leads to a warming in the Gulf 
Stream and a cooling in the subpolar 
gyre — the cyclonic system of wind- 
driven ocean currents that lies to the 
south of Iceland (Fig. 1). Although the 
link between the relatively cool SSTs of 
the North Atlantic’s subpolar gyre and 
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a slowdown of the AMOC have been studied 
previously**, the main advance of Caesar and 
colleagues’ work is their comprehensive com- 
parison of global SST data sets with state-of- 
the-art, high-resolution climate models. 

The authors’ data analysis shows that this 
bipolar pattern of cooling and warming 
emerged in the mid-twentieth century. When 
they performed climate simulations under 
a 1% yearly increase in carbon dioxide, the 
model produced a pattern of SST anomalies 
in the North Atlantic similar to that seen in 
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Figure 1 | The Atlantic meridional overturning circulation 
(AMOC) and the subpolar gyre. The AMOC is an ocean 
circulation system that consists of warm surface currents (orange) 
and cold deep-water return flows (blue), as shown in this simplified 
representation. The surface currents include the Gulf Stream, 

which feeds a branch of the AMOC known as the North Atlantic 
Current. The deep-water return flows start from three branches that 
merge into the North Atlantic Deep Water. Thornalley et al.’ used 
measurements of silt in sediment cores to reconstruct the flow speed 
of the AMOC in the past 1,600 years; the black star indicates the 
approximate location at which the sediment cores were collected. 
Caesar et al.’ analysed temperature anomalies in the North Atlantic 
subpolar gyre (dashed line) to infer changes in AMOC flow in the 
past century. Both studies conclude that the AMOC has weakened 
by about 15% during the periods considered, but they differ on when 
the flow started to decline. 


the observational data, and demonstrated that 
this pattern was associated with a decline in 
AMOC strength. The authors then calibrated 
the model's results with their SST data to esti- 
mate that the AMOC has declined by about 
15% in the past half-century. They infer that 
the slowdown in the AMOC was probably a 
response to warming caused by anthropogenic 
greenhouse-gas emissions. A possible mecha- 
nism could be enhanced melting of the Green- 
land Ice Sheet’, which adds fresh water to the 
surface ocean and reduces the density of the 
water that drives deep convection. 

Thornalley et al. provide a longer-term 
perspective on changes in AMOC strength 
during the past 1,600 years using a proxy 
measurement — the ‘sortable-silt’ grain 
size’ — of deep-sea sediment cores that 
reflects the speeds of the bottom waters 
that flow along the path of the North Atlan- 
tic Deep Water, the deep-water return flow 
of the AMOC (Fig. 1). They combined this 
approach with a method similar to that used 
by Caesar and colleagues: they used past, 
near-surface temperature anomalies recorded 
in the marine sediments to provide additional 
constraints on the AMOC. 

The researchers found that the 
strength of the AMOC was relatively 
stable from about AD 400 to 1850, 
but then weakened around the start 
of the industrial era. This transition 
coincides with the end of the Lit- 
tle Ice Age — a multicentennial cold 
spell that affected many regions of the 
globe’. Thornalley and colleagues 
infer that the weakening of the AMOC 
at that time was probably a result 
of the input of fresh water from the 
melting of Little Ice Age glaciers and 
sea ice. They estimate that the AMOC 
declined in strength by about 15% 
during the industrial era, relative to 
its flow in the preceding 1,500 years. 
This is remarkably similar to Caesar 
and co-workers’ estimate, despite the 
different time periods on which they 
base their estimates. 

However, the roughly 100-year dif- 
ference in the proposed timing of the 
start of the AMOC decline in these 
two studies has big implications for 
the inferred trigger of the slowdown. 
Caesar et al. clearly put the onus 
on anthropogenic forcing, whereas 
Thornalley et al. suggest that an earlier 
decline in response to natural climate 
variability was perhaps sustained or 
enhanced through further ice melting 
associated with anthropogenic global 
warming. Nevertheless, the main 
culprit in both scenarios is surface- 
water freshening. 

The two studies are classic 
examples of ‘top-down and ‘bottom- 
up approaches, and so it is unsurpris- 
ing that there is some misalignment 
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between them. Caesar et al. take the top-down 
approach: their inferences of changes 
in the AMOC strength are made from 
reconstructions of regional and global SSTs 
that are derived from direct measurements of 
temperature. It is possible that regions other 
than the North Atlantic in which there has 
been decadal-scale variability in SSTs could 
influence the mean global SST from which 
the AMOC strength is calculated — although 
the authors do attempt to quell such doubts by 
showing that the subpolar-gyre SST anomaly 
is robust relative to the global mean SST for 
a subset of time periods (see Extended Data 
Fig. 2 in ref. 4). 

Thornalley and colleagues’ strategy is more 
of a bottom-up approach: they use a proxy 
for deep-water current strength to meas- 
ure AMOC strength more directly than do 
Caesar and co-workers. The weaknesses of 
this approach are that it accounts for only 
the local bottom currents at the sites from 
which the cores are taken, which might not 
capture the entire AMOC system, and that it 
could be susceptible to local nonlinear effects 
such as abrupt shifts in the position of the 
current. However, Thornalley et al. show that 
there is a striking correlation between their 
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grain-size proxy and the measured density of 
the Labrador Sea Water (a major component 
of the North Atlantic Deep Water), as well as 
with the heat content of the subpolar gyre; these 
correlations shore up the bridge that links their 
localized proxy measurements to broader-scale 

changes in the AMOC. 
For now, the timing of the AMOC decline 
remains a source of intrigue. Future studies 
that provide a more- 
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were changing and 
when. It is — at least 
scientifically — reassuring to see that the pre- 
sent two studies converge on the conclusion 
that the modern AMOC is ina relatively weak 
state. However, in the context of future cli- 
mate-change scenarios and a possible collapse 
in the AMOC" in response to the continued 
melting of the Greenland Ice Sheet”, it is 
perhaps less reassuring, because a weakened 


The telomerase enzyme 
and liver renewal 


Cell-tracing analysis reveals that a disperse group of cells in the mouse liver 
express the enzyme telomerase, which preserves chromosome ends. These cells 
contribute to liver maintenance and regeneration. SEE LETTER P.244 


KENNETH S. ZARET 


he enzyme telomerase maintains the 
length of specialized repetitive struc- 
tures called telomeres, which are found 
at the ends of chromosomes. When they 
become damaged or shortened, telomeres can 
stop cells from dividing’. Most cells in adult 
humans have very low or undetectable levels 
of telomerase and relatively short telomeres, 
and therefore have a limited ability to repli- 
cate’. However, elevated telomerase levels are 
seen in various animal and human stem cells 
that must retain their replicative capacity for 
self-renewal’. Telomerase defects are associ- 
ated with tissue scarring (fibrosis) in the livers 
of both mice and humans*”, but which cells 
in the liver express telomerase, and whether 
they act as stem cells, has been unclear. On 
page 244, Lin et al.® characterize this cell 
population in mice. 
First, the authors identified telomerase- 
expressing cells in the mouse liver and tracked 


descendent cells. The group genetically 
engineered mice to contain a modified version 
of the gene Tert, which encodes a subunit of 
telomerase. When the mice are treated with 
a drug, this alteration causes cells expressing 
Tert to be indelibly labelled by a fluorescent 
protein. Once the genetically modified cells are 
triggered in this way, they and all their descen- 
dants produce the fluorescent protein, even if 
the cells no longer express Tert itself. 

Lin et al. found that 3-5% of hepatocytes, 
the most prevalent type of cell in the liver, 
fluoresce in response to drug treatment. The 
authors confirmed, by quantitation of mes- 
senger RNA levels, that these cells express Tert. 
Next, they examined the livers of adult mice 
one year after drug treatment. The initially 
labelled cells (dubbed Tert'™®") had given rise 
to clusters of descendants dispersed throughout 
the liver’s lobes, making up about 30% of the 
liver’s total mass (Fig. 1). Adult hepatocytes die 
and are replaced infrequently, so the increase 
in labelled cells over long periods indicates 
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AMOC might lead to considerable changes in 
climate and precipitation patterns throughout 
the Northern Hemisphere’. = 
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that the Tert'"** hepatocytes contribute to the 
gradual renewal of the liver under normal 
conditions. 

A key question is whether the Tert 
hepatocytes are a stable, self-renewing popu- 
lation. Alternatively, Tert could be expressed 
in certain cells for a period of time, then shut 
off in those hepatocytes and expressed in oth- 
ers. In support of the former case, when Lin 
et al. triggered fluorescent-protein labelling 
three times over a ten-week period, they found 
that the numbers of labelled hepatocytes were 
comparable to those for a single trigger. Next, 
they showed that 75% of labelled hepatocytes 
expressed high levels of Tert mRNA when they 
were examined a month after a single drug 
treatment, whereas only 18% did so after a year, 
indicating that, as the population gradually 
expands, Tert™® cells not only self-renew but 
also give rise to progeny that do not express Tert 
(Tert'*”). Finally, the researchers demonstrated 
that Tert'"®" hepatocytes proliferate more than 
Tert'®” cells, whereas Tert’” cells exhibit higher 
expression of genes relating to metabolism and 
biosynthesis than do Tert'"* cells. 

Taking these data together, the authors 
suggest that Tert'"*” hepatocytes behave like 
stem cells. But before concluding that the 
Tert'" cells are bona fide stem cells for the liver, 
it will be necessary to determine whether the 
Tert'"®" population becomes exhausted or 
remains at similar levels in older mice (because 
hepatocytes are still renewed in ageing mice), 
and whether Tert'™ cells convert to Tert'"®" over 
longer periods than those used here (which 
would indicate that this population is not 
acting as stem cells). It will also be interesting 
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Figure 1 | Lineage tracing in the liver. Lin et al.° characterize the hepatocyte cells in the mouse liver 
that express high levels of the gene Tert, which encodes a subunit of the enzyme telomerase. The authors 
generated mice that carry a genetically engineered version of Tert: when the mice are treated with a 

drug, any cells expressing Tert are indelibly labelled with a fluorescent protein. Those cells and all their 
descendants fluoresce, and so can be tracked. Only 3-5% of cells fluoresced immediately after drug 
treatment. One year later, about 30% of cells fluoresced, but most of these did not express Tert, indicating 
that the rare Tert-expressing cells give rise to new hepatocytes to help regenerate the liver. If the Tert- 
expressing cells are genetically ablated, the liver is susceptible to scarring (fibrosis) after toxin damage. 


to determine the processes by which cells 
transition from Tert™*" to Tert'*”, and how 
this change relates to homeostatic control of 
liver mass. 

Importantly, stem cells typically reside in 
a special tissue compartment, or niche, that 
supports their regenerative capacity. Yet the 
Tert'"®* cells are dispersed throughout the liver. 
This dispersal of Tert'"** cells is interesting 
because hepatocytes reside in different zones 
in each lobe of the liver, and earlier studies’ 
implicated one zone or another as being more 
relevant to liver regeneration. By contrast, Lin 
et al. provide evidence for a ‘distributed model’ 
for hepatocyte renewal. The research indicates 
that, although the Tert'"*" hepatocytes possess 
features of stem cells, those features are not of 
a conventional type. 

In the past three years, one regenerative 
hepatocyte population near the central vein 
has attracted particular attention. The popula- 
tion responds to venous signals to self-renew 
during homeostasis, producing progeny that 
migrate outwards from the central zone®. Lin 
et al. found a few Tert'"*" hepatocytes in the 
central zone in healthy livers, but these cells 
did not reside close enough to the central vein 
to respond to its signals. However, when the 
authors damaged the central-vein zone, Tertis" 
descendants appeared there and responded to 
venous signals. Moreover, after damage to the 
liver tissue in another region, around the por- 
tal vein, hepatocytes descended from Tert'™" 
cells appeared abundantly in the periportal and 
mid-lobular zones, and the researchers found 
that ablation of Tert'"*" hepatocytes impaired 
this regenerative response, leading to liver 
fibrosis. Taking the above findings together with 
those of other studies of liver injury, it seems 
that various types of hepatocyte (as well as cells 
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from the bile duct)” can regenerate the mouse 
liver under a range of damage conditions. 

In the future, it will be crucial to assess 
how relevant these findings in mice are to 
human liver regeneration. The fact that abla- 
tion of Tert'"" hepatocytes results in fibrosis 
in the injured mouse liver seems to support 
relevance for humans, because people who 
harbour mutations in TERT and genes that 
encode other telomere-related factors can also 
exhibit fibrosis and cirrhosis (the latter being 
a predictor of liver cancer) > However, Tert™2" 
hepatocytes have not been seen in human 


EVOLUTION 


livers — although the possibility has not yet 
been assessed with the sensitivity of the genet- 
ic-labelling approach used in mice by Lin and 
colleagues. An alternative explanation for dis- 
eases in humans who have telomerase-related 
mutations is that excessive telomere shorten- 
ing in early development might affect many 
organ progenitors in a nonspecific way. 

More-detailed studies in humans will be 
needed to confirm how telomerase-based 
regeneration forestalls liver disease, and 
possibly liver cancer. Nevertheless, Lin and 
colleagues’ study provides insight into a pre- 
viously unidentified, dispersed-cell mode of 
liver regeneration. m 
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Backbone of RNA 
viruses uncovered 


The evolutionary history of viruses is largely unknown. Large-scale discovery of 
vertebrate RNA viruses shows that, although viruses often jump between hosts, 
most have co- evolved with their hosts over millions of years. SEE ARTICLE P.197 


MARK ZELLER & KRISTIAN G. ANDERSEN 


any human diseases, from the common 
M cold to deadly haemorrhagic fevers, are 

caused by RNA viruses. Most of these 
viruses are thought to have originated from 
close relatives that infected mammals’, and 
so the majority of virus-discovery studies have 
focused on mammals and birds*. RNA viruses, 
however, are probably older than the last com- 
mon ancestor of life on Earth*”. Detailed genetic 


information for RNA viruses from other classes 
of vertebrate is sorely needed if we are to fully 
understand long-term virus evolution. On 
page 197, Shi et al.° report the discovery of 
previously unidentified vertebrate RNA viruses 
from across evolutionary timescales. 

The authors analysed the viruses in 
186 vertebrate species using an approach 
called metatranscriptomic sequencing, in 
which all of the RNA present in a sample is 
sequenced. The samples were taken from 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


species of fish, amphibian and reptile — every 
vertebrate class except mammals and birds. In 
these samples, Shi and colleagues discovered 
a total of 214 viruses, dramatically increasing 
the number of known RNA viruses in each 
vertebrate class. For example, they identified 
more than 20 RNA viruses that infect amphib- 
ians, whereas just a few had previously been 
identified’. 

The analysis also revealed an astonishing 
level of biodiversity — the researchers iden- 
tified previously unknown viruses in almost 
every RNA-virus family known to infect 
mammals. These include viruses highly 
pathogenic to humans, such as influenza 
virus, arenaviruses and filoviruses, that 
have not previously been reported in fish or 
amphibians. 

Shi et al. used this information to construct 
phylogenetic trees that describe the evolution- 
ary relationships between viruses. They found 
that the phylogenies of RNA viruses were 
broadly comparable to those of the viruses’ 
vertebrate hosts. This shows that RNA viruses 
followed a similar evolutionary trajectory to 
vertebrates, and have co-evolved with their 
hosts over millions of years (Fig. 1). The evo- 
lution of vertebrates began more than 500 mil- 
lion years ago — vertebrate life then divided 
into several classes of fish, followed by the evo- 
lution of amphibians that moved on to land 
(http://www.onezoom.org). The authors find- 
ings indicate that mammalian RNA viruses 
probably originated from viruses that infected 
fish, and then followed vertebrates on to land. 

However, the researchers also show that some 
viruses can infect multiple hosts, indicating that, 
in addition to co-evolution, viruses have made 
jumps between species. In fact, many virus 
outbreaks in humans are the result of animal- 
to-human transmission, as exemplified by the 
recent Ebola epidemic in West Africa’’. Most 
cross-species transmission events result in 
limited or no onwards transmission (the virus 
typically continues to circulate only temporar- 
ily in the new host species), and the ability of 
a virus to establish itself depends on a range 
of factors, including host divergence”. Thus, 
transmission between animals belonging to 
the same vertebrate class (bats to humans, for 
example) is more likely than that between ani- 
mals belonging to different vertebrate classes 
(such as reptiles to mammals). But Shi and 
colleagues’ phylogenies reveal that viruses 
regularly jump between vertebrate classes, 
with successful onwards transmission that can 
continue for millions of years. 

The current study greatly expands our 
knowledge of vertebrate virus evolution. 
However, it is not without limitations. First, 
excluding birds and mammals, there are more 
than 50,000 vertebrate species. And although 
the current study is one of the largest of its 
kind, Shi et al. sampled less than 0.5% of 
these species. Moreover, the authors focused 
their sampling towards common taxa such 
as ray-finned fishes, and included relatively 


NEWS & VIEWS | RESEARCH | 


>» Birds 

Reptiles 
Mammals 
Amphibians 
Lungfish 
Ray-finned fish 
Cartilaginous fish 


ey Jawless fish 


Present 


Millions of years ago 


Figure 1 | Tracking the evolution of RNA viruses. 


Shi et al.° sequenced RNA viruses present in various 


classes of vertebrate, and constructed trees of virus evolution. Over a period of 525 million years, 
vertebrates branched off into several classes. The beginning of each coloured blocked arrow indicates 
the divergence between a vertebrate group and that below it in the figure; the beginning of the darker 
shading indicates the time that the most recent common ancestor of currently extant members ofa class 
arose. The authors found that RNA viruses co-diverged with their vertebrate hosts (black lines indicate 
virus evolution). Each vertebrate class is dominated by its own set of RNA viruses; however, occasional 
cross-species transmissions occur (dashed arrows), introducing new viruses into a particular class. This 
phylogenetic tree is a simplified schematic to exemplify RNA-virus evolution as a whole, and does not 
reflect precise dates or cross-species transmission events found by the authors. 


few amphibians. This means that the group’s 
findings represent only a minuscule fraction 
of the total diversity of RNA viruses. We are 
just scratching the surface of these viruses’ evo- 
lutionary history. Our understanding of viral 
evolution will continue to expand as we sample 
RNA viruses from across deeper evolutionary 
timescales. 

Another limitation of the current study is 
that — as is typical for this type of work — new 
viruses are identified on the basis of genetic 
similarity to those that have been sequenced 
previously. This strategy has the potential to 
introduce biases. It is therefore possible that 
there are entire groups of viruses yet to be dis- 
covered, because they cannot be detected using 
similarity-based approaches. 

Finally, it is becoming increasingly clear that 
only a tiny fraction of RNA viruses will ever 
infect humans, and the factors that contribute 
to virus emergence in humans are not fully 
understood. As Shi et al. show, phylogenetic 
analyses are a powerful tool for identifying 
cross-species transmissions that happened in 
the past. But they cannot be used to predict 
host jumps and virus emergence of the future 
— the complexity of successful cross-species 
transmission renders efforts to predict dis- 
ease emergence by mapping non-human virus 
diversity ineffective’’. Studies that give us a 
more fundamental understanding of RNA- 
virus evolution and diversity, as Shi and col- 
leagues work does, will be crucial to inform 
future surveillance efforts in humans. 

It took us many decades to understand 
the basics of the evolutionary history of 


vertebrates. It will probably take even longer 
before we can confidently say that we are 
beginning to understand the enormous diver- 
sity of RNA viruses and their complex relation- 
ships with humans and other vertebrates. Shi 
et al. have provided an exciting starting point 
from which to strike out towards this goal. = 
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Reversible Mn2+/Mn‘** double redox in 
lithtum-excess cathode materials 


Jinhyuk Lee)?*, Daniil A. Kitchaev?, Deok-Hwang Kwon!, Chang-Wook Lee’, Joseph K. Papp*, Yi-Sheng Liu’, Zhengyan Lun!, 
Raphaéle J. Clément!, Tan Shi!, Bryan D. McCloskey*”®, Jinghua Guo’, Mahalingam Balasubramanian? & Gerbrand Ceder)** 


There is an urgent need for low-cost, resource-friendly, high-energy-density cathode materials for lithium-ion 
batteries to satisfy the rapidly increasing need for electrical energy storage. To replace the nickel and cobalt, which are 
limited resources and are associated with safety problems, in current lithium-ion batteries, high-capacity cathodes 
based on manganese would be particularly desirable owing to the low cost and high abundance of the metal, and the 
intrinsic stability of the Mn** oxidation state. Here we present a strategy of combining high-valent cations and the partial 
substitution of fluorine for oxygen in a disordered-rocksalt structure to incorporate the reversible Mn?*+/Mn** double 
redox couple into lithium-excess cathode materials. The lithium-rich cathodes thus produced have high capacity and 
energy density. The use of the Mn”*/Mn** redox reduces oxygen redox activity, thereby stabilizing the materials, and 
opens up new opportunities for the design of high-performance manganese-rich cathodes for advanced lithium-ion 


batteries. 


Lithium-ion-based energy storage is becoming a pervasive technology 
in today’s society. Introduced in the early 1990s for use in portable 
electronics, it has now migrated to applications such as transportation 
and grid, for which energy storage needs will soon dwarf the use in 
electronics’. Indeed, today, with electric vehicles making up about 1% 
of all car sales, almost half of all Li-ion batteries produced are already 
directed towards transportation. These new applications increase the 
demand for safe high-energy storage at low cost and without relying on 
constrained natural resources! . In this context, it is remarkable that 
almost all Li-ion cathode materials rely on only two transition metals, 
Ni and Co, which are the electroactive elements in the layered-rocksalt 
cathode materials in the Li(Ni,Mn,Co)O chemical space (NMCs)°. 
On one end of this compositional spectrum, LiCoO dominates the 
electronics sector, whereas Ni-rich materials are of interest for the auto- 
motive sector’. Although Mn has been used in a spinel cathode’, and 
Fe in the LiFePO, olivine®, these compounds suffer from low energy 
density. Given the limits of energy density that can be achieved with 
the layered NMCs and the potential resource constraints on cobalt’, it 
is of interest to develop high-capacity cathode materials based on other 
redox metals. In particular, transition metals that can exchange two 
electrons are of interest for their ability to create high capacity, similar 
to the Ni?*/Ni** couple in NMC cathodes. Low cost and low toxicity 
make the Mn?*/Mn‘*" couple particularly desirable® for designing high- 
performance Li-ion batteries that are also inexpensive and eco-friendly. 

Manganese is currently used in cathode materials, but mostly in the 
inert Mn** state, as in NMC cathodes, or for its Mn**/Mn** couple, 
as in LiMn Oy, spinel!**”. More recently, Mn?" has been used in 
disordered-rocksalt-type cathodes, such as Li; 3Mno.4Nbo.3O2, in 
which the low capacity from Mn**/Mn** needs to be overcome by 
a large amount of oxygen redox’, which can trigger O loss, resulting 
in substantial voltage and capacity fade”!°. In LiyMn.Os, a high ini- 
tial capacity (>300 mAh g7') is achieved by oxidizing Mn? past the 


standard Mn?+/Mn‘** redox couple, but this causes substantial voltage 
and capacity fade in subsequent cycles!’. In our approach, we start 
instead from Mn?" in the discharged state so that a high theoretical 
capacity can be obtained by oxidizing to Mn** without relying on O 
redox. Cycling between two stable valence states of Mn, and limiting the 
O redox, is expected to improve the reversibility of the charge/discharge 
process. Reduction to Mn’* has been observed’ by lithiation of amor- 
phous Li; 5sNao sMnO> ¢sIo.12, but as this cathode material is synthesized 
in the charged state it does not enable Li to be brought into the Li-ion 
cell. The development of a high-performance Li-ion cathode based on 
the Mn?+/Mn** couple requires a material that forms in its discharged 
state, contains enough Mn?" and Lit ions to provide high capacity 
and preferably crystallizes in a dense structure, such as the layered or 
disordered-rocksalt structure, to maximize its volumetric energy 
density. Introducing Mn?" in the dense layered or disordered materials 
has been difficult, as the Li excess (x > 1 in Li, TM2_,O2, where TM 
is transition metal) required to achieve high practical capacity'®!*4 
demands a high average transition metal valence. 

In this work, we demonstrate that high capacity (>300 mAh g~') and 
energy density (about 1,000 Whkg7’) can be achieved in disordered- 
rocksalt Li-rich intercalation cathodes from Mn?*/Mn** double redox 
combined with a small amount of O redox. A critical step is that we 
are able to lower the Mn valence in the cathode material through a 
combined strategy of high-valent cation (Nb°*, Ti**) substitution’ and 
O*- replacement'*~!” by F~. This O?~ replacement was recently shown 
to be aided by Li excess and cation disorder’®. We target the Mn?*- 
containing compositions LizMn/3Nb,/302F and LigMnj/2Tiy/202F, 
which have a theoretical Mn?*/Mn** redox capacity of 270 mAh g™! 
and 230 mAh g“, respectively. Given the high Mn capacities, only 
a small amount of O redox is required for these materials to deliver 
a total capacity over 300 mAh g’, mitigating problems related to O 
redox. Thus realized, high capacity from Mn?*/Mn** double redox 
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Fig. 1 | Design and structural characterization of Li.Mn2/3Nb/302F. 
a, Theoretical Mn-redox capacity of various Mn-based cathode materials. 
b, The X-ray diffraction pattern of LixyMn/3Nb1/302E. c, EDS mapping 


opens new opportunities for the design of high-performance Li-ion 
cathode materials. 


Structural characterization of LiyMn/3Nb,/302F 
To evaluate the Mn**/Mn‘* redox strategy, we first test a new 
disordered Li-rich material: LizMn2/3Nb,/302F (equivalent to 
Liz 333Mno444Nb9.22201.333F 0.667). synthesized by a mechanochemical 
ball-milling method!""'*!”, The combined presence of high-valent Nb°* 
and low-valent F” sets up the charge balance to incorporate Mn as Mn?* 
in the pristine LiMn /3Nb,/302F material, leading to a very high theo- 
retical Mn-redox capacity of 270 mAh g™', which is more than twice 
that of a typical Mn-based Li-rich cathode material (Fig. 1a). In addi- 
tion, the d° configuration of Nb*+ (similar to that of Ti**, V°", Zr#* 
and Mo) promotes the formation of a disordered-rocksalt structure'®. 
X-ray diffraction (XRD) patterns (Fig. 1b, Extended Data Table 1) 
and elemental analysis (Extended Data Table 2) show that the com- 
pound forms in a disordered-rocksalt phase with a composition close to 
the target composition''. XRD refinement yields a lattice parameter of 
about 4.262 A (Extended Data Table 1). Energy-dispersive spectroscopy 
mapping (EDS) on a LizMnz;3Nb)/302F particle, using a transmission 
electron microscope (TEM), reveals a uniform distribution of Mn, Nb, 
O and F (Fig. Ic). Results of 7Liand °F nuclear magnetic resonance 
(NMR) reveal that some Li can be found in diamagnetic environments 
and some F can be found in LiF-like environments (Extended Data 
Fig. 1, Methods section ‘Supplementary Note 1’). Although this sug- 
gests that small amounts of impurity phases (for example LiF, Li,O, 
Li,CO3) may be present in the as-synthesized LizMn/3Nb1/302F 
sample, we cannot rule out the presence of diamagnetic or LiF-like 
local domains in the rocksalt phase. In fact, no crystalline impurities 
could be detected with XRD. TEM shows that the primary particles 
are polycrystalline and made of crystalline grains about 15 nm in size 
(Extended Data Fig. 2). No amorphous components were detected 
in TEM, indicating that the electrochemical properties are predomi- 
nantly determined by the LiyMn/3Nbj/302F phase. Scanning electron 
microscopy (SEM) shows that the primary particle size of the as- 
prepared LipyMnz/3Nb1/302F compound is 100-300 nm (Fig. 1d), which 
is reduced to less than 100 nm after mixing with carbon black using a 
shaker-mill for electrode fabrication (Fig. le). 


Electrochemical performance of LigMn2/3Nb,/302F 

Galvanostatic charge—discharge tests of LizMnz/3Nb1/302F at 20mAg! 
show a discharge capacity of 238 mAh g~! (708 Whkg~') between 
1.5 V and 4.6 V, which increases to 277 mAh g~! (849 Whkg~') and 
304 mAh g“! (945 Whkg7') with higher charge cut-off voltages of 
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Shaker-milled 
: 


(Mn, Nb, O, F) on a LizMno/3Nb1/30F particle. Scale bar, 100 nm. 
d, e, SEM images of LizMn2/3Nbj/30>F: d, as-synthesized (scale bar, 
400 nm) and e, shaker-milled with carbon black (scale bar, 200 nm). 


4.8 V and 5.0V, respectively (Fig. 2a—c). In a test between 1.5 V and 
5.0V at 1OmAg! (Fig. 2d), the discharge capacity further increases 
to 317 mAh g", delivering a very high energy content of 995 Whkg"! 
(3,761 Wh17’). This discharge capacity of about 320 mAh g~' and 
specific energy approaching 1,000 Whkg™! are among the highest 
values achieved by Li-ion intercalation cathodes'®"!*'°. The reversible 
capacity and energy density at 20mA g* decrease to 233 mAh g! 
(180mAh g') and 760 Whkg™! (600 Wh kg), respectively, as the volt- 
age window is reduced to 2.0-4.8 V (2.3-4.6 V) (Extended Data Fig. 3). 
The rate capability of Liy¢Mnz/3Nb1/30%F is fairly good. Figure 2e com- 
pares the first cycle profiles of LiyMn2/3Nb)/302F under different rates 
between 1.5 V and 5.0 V. The material delivers as high as 226 mAh g™! 
(695 Whkg~') at 200 mA g~' and up to 140 mAh g“! (410 Whkg™!) 
at a very high rate of 1,000 mA g~' (Extended Data Fig. 4). The data 
presented here were obtained on electrode films made of 60 wt% active 
material, but the performance is similar for electrodes with a higher 
loading of 70 wt% and 80 wt% (Extended Data Fig. 3). 

The voltage profiles of LizMn2/3Nb1/302F do not contain signifi- 
cant hysteresis and remain nearly identical during cycling, suggest- 
ing that structural changes and oxygen loss are small®?°?!, Only 
upon very high-voltage charging to above 4.7 V is an apparent voltage 
plateau observed which is barely seen in discharge (Fig. 2f). As 
LizMn/3Nb1/302F delivers a higher capacity than its theoretical Mn 
capacity (270 mAh g~’), we expect that the charge plateau at about 
4.8 V accompanies O-oxidation. The asymmetry in charge/discharge 
voltage is similar to previous observations in which the O-oxidation 
plateau is not recovered in the discharge*”®”!. Nevertheless, this pla- 
teau in LigMny/3Nb1/302F appears only after charging above about 
250 mAh g™|, leading to less voltage hysteresis than for the other 
Mn-redox-active disordered compounds in which the O-oxidation 
plateau occurs typically much earlier in the charge*”®!. The smaller 
amount of O-oxidation and negligible changes in the voltage profile of 
LigyMny/3Nb,/302F are further supported by differential electrochem- 
ical mass spectrometry (DEMS) results, which show negligible O (g) 
evolution up to a charge of 5 V (Extended Data Fig. 5, Methods section 
‘Supplementary Note 2’). In addition, voltage fade is small in this 
material (Extended Data Fig. 6, Methods section ‘Supplementary 
Note 3’). These results indicate that our strategy of going to Mn?2+ 
compounds to increase the Mn-redox capacity at the expense of 
O redox is successful in increasing capacity and reversibility. 


Redox mechanism of LiyMny/3Nb,;302F 
The redox mechanism and structural evolution of Lix:Mn /3Nb,/302F 
have been further studied by X-ray diffraction, and by hard X-ray and 
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Fig. 2 | Electrochemical performance of Li,Mn2);Nb,/302F. a-d, Voltage 
profiles and capacity retention of LiMn /3Nb1/302F under various cycling 
conditions: a, 1.5-4.6 V, 20mA g_!;b, 1.5-4.8 V, 20mAg™!;¢, 1.5-5.0V, 
20mA g_!; and d, 1.5-5.0V, 10mAg |.e, The first-cycle voltage profiles of 


soft X-ray absorption spectroscopies. Figure 3a shows a reversible 
lattice-parameter change upon cycling, as observed in other 
disordered-rocksalt intercalation cathodes**°-**. The shift of the (002) 
and (022) peaks to a higher angle upon charge (indicating a decrease 
of lattice parameters) is recovered on discharge. Upon charging, the a 
lattice parameter decreases from 4.258 A to 4.130 A at the top of charge 
and returns to 4.250 A after full discharge. 

Hard X-ray absorption spectroscopy (XAS) suggests that Mn? 
is oxidized during charge towards Mn**, a process that is reversed 
upon discharge. Figure 3b shows the Mn K-edge X-ray absorption 
near-edge structure (XANES) for LiyMn/3Nb1/302F at various states 
of charge and discharge. As the charge capacity increases from 0 to 
135 mAhg! and 270 mAh g“!, the Mn K-edge shifts from an energy 
close to MnO (Mn?* reference) to that of Mn2O3 (Mn** reference) 
and then partway to the energy seen in MnO (Mn** reference). 
Further charging to 360 mAh g“’ leads to only minor shifts. The 
Mn K-edge almost completely returns to its original position after 
discharge. Although the exact amount of each valence state cannot 
be quantified, as the near-edge structure is sensitive to both the oxi- 
dation state and bonding environment?’, this result suggests that, 
on full charge, Mn?* is oxidized to Mn** with some Mn? or Mn3* 
ions remaining. Full recovery to Mn?* occurs on discharge. This 
interpretation is further supported by a derivative analysis on the Mn 
pre-edge at about 6,540 eV (Extended Data Fig. 7, Methods section 
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200, 400 and 1,000 mA gl. f, The first-cycle and second-charge profiles of 


LipyMn2/3Nb1/302F under different voltage windows: 1.5-4.6 V, 1.5-4.8 V 
and 1.5-5.0 V. All tests were conducted at room temperature. 


‘Supplementary Note 4’). The species Nb°* does not participate in 
redox processes (Extended Data Fig. 8). 

Whereas hard X-rays probe metal oxidation, soft X-ray absorption 
using a total fluorescence yield can be used to investigate oxygen redox 
behaviour. Figure 3c traces the pre-edge features of the O K-edge XAS 
spectra of LiyMny/3Nb1/302F at various states of charge. The pre-edge 
is primarily associated with the O 1 to 2p transition, and its inten- 
sity is attributable to the density of unoccupied Nb 4d-O 2p and Mn 
3d-O 2p hybridized states. We associate the pre-edge feature around 
530.9 eV with unoccupied Nb 4d-O 2p hybridized states, as Mn?* 
oxides (for example MnO) typically exhibit a pre-edge feature above 
about**”> 533 eV. Charging to 135 mAh g™! (theoretical Mn?*/Mn** 
limit) increases the intensity in the 529-532 eV range which is typical 
for Mn3* oxides such as”? Mn)O3. After charging to 270 mAh g~! and 
360 mAh g7!, an intensity gain is observed broadly between 528 and 
530 eV. The largest major intensity gain is centred around 529 eV (fea- 
ture A) which is characteristic of Mn** oxides (for example MnO>, 
LiyMnO;)***S, suggesting some Mn*+/Mn** oxidation on charge. Along 
with this feature A, we see an intensity gain at 530-531 eV (feature B) 
after charging to 270 mAh g“' and 360 mAhg". Previously, O oxi- 
dation has been shown to create a broad component around 530.2 eV 
in Mn-based disordered Li-rich cathodes*”'. Therefore, this feature 
B most probably indicates O oxidation which concurrently occurs 
with Mn**/Mn‘* oxidation. Discharging to 320 mAh g“! restores the 


/i\ Before /\ 


40 


50 60 70 


Position (26, Cu) 


Normalized absorbance 


4.3 
= me a 
< 42 are Bf 
o nn 


4] 


Normalized absorbance 


Energy (eV) 


6,550 
Energy (eV) 


Before 135¢ 270c 360¢ 320dc 6,540 
Fig. 3 | Reaction mechanism of Li.Mn2/3Nb/302F. a, XRD patterns of 
LigMny/3Nb1/302F during the first cycle at 10 mA g™! and the refined 
a-lattice parameters (c, charge; dc, discharge). b, c, Manganese K-edge 
XANES spectra from hard XAS (b) and O K-edge spectra from soft XAS 
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(c; using total fluorescence yield mode) during the initial cycle. Features 
A and B are described in the text. Plots are shown for LizMn/3Nb,/302F 
before cycling; 135 mAh g ! charged; 270 mAh g"' charged; 360 mAh g"! 
charged; 320 mAh g™! discharged after a 375 mAh g_! charge. 
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Fig. 4 | Ab initio calculations of the redox mechanism of 
LigyMnz/3Nb1/302F. a, b, Manganese (a) and oxygen (b) average oxidation 
state as a function of delithiation (x in Liy_.Mn/3Nb)/302F) and artificially 
introduced strain relative to the discharged state (x = 0). c, Change in the 
average oxidation state of Mn atoms that are coordinated by three or more 
fluorine atoms and those coordinated by two or fewer fluorine atoms. 

d, Change in the average oxidation state of O atoms with three, four and 


pre-edge shape and intensity, indicating Mn and O reduction. Hence, 
the electrochemical processes in this compound are reversible. 


Ab initio study of LizgMn2/3Nb/302F 

In conventional Li-Mn oxides (for example LiMnO;, Li.MnO;), Mn 
oxidation up to Mn** is not competitive with O oxidation®!02%1, 
The question is then why there is a partial overlap between these 
redox processes in LizMnz/3Nb,/302F. The main differences between 
Li2Mn./3Nb,/302F and other Mn-based Li-excess oxides are the pres- 
ence of fluorine and the relatively large lattice parameter (a is about 
4.26 A for LizMn2/;3Nb1/302F compared with about 4.19 A for Li, 3Mno4 
Nbo.302)®??! which leads to a larger distance between Mn and the 
ligand. To elucidate the impact of these features on electrochemical 
behaviour, we study the effect of F-coordination and lattice volume 
on the redox mechanism using density functional theory calculations. 
Note that although we compute the redox mechanism through electron 
titration as described in the Methods section, for clarity we refer to the 
degree of charge in terms of Li content. 

Figure 4a, b shows the Mn and O average oxidation states as a func- 
tion of delithiation (x in Liz_,.Mnz/3Nb1/302F) and for varying degrees 
of compressive strain relative to the fully relaxed discharged state 
((Ip —D/Ip x 100%, where ly is the lattice parameter of the fully relaxed 
discharged state and / is the compressed lattice parameter). The 
vertical bars account for the range of results obtained from various 
structural models of disordered-rocksalt LiyMn2/3Nb1/30F. Initial 
delithiation up to x = 0.667 (theoretical Mn?*/Mn?* limit) modifies 
only the Mn oxidation state, but further Li removal leads to concurrent 
O and Mn oxidation. Smaller lattice volume increases the degree of 
Mn?*/Mn** oxidation at a fixed lithium level. At270 mAh g™! charge 
(x= 1.333), the experimentally observed lattice parameter reduction is 
2.6% (Fig. 3a). At this strain, the calculations indicate average oxida- 
tion states of approximately Mn*°* and O!*-, supporting the presence 
of an overlap between Mn>*/Mn** and O redox, and fully consistent 
with our experimental results. This result seems to indicate that the 
large lattice parameter of LigMn2/3Nb1/3O02F is partly responsible for 
the overlap. 

Figure 4c,d clarifies the impact of local environment on the oxida- 
tion of Mn and O. Figure 4c compares the average oxidation state of 
Mn atoms that have two or fewer F ligands out of six anion neigh- 
bours with that of Mn with three or more F ligands. At a given level 
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five Li nearest neighbours in the fully lithiated state (x= 0). The data 

in cand d were collected from model structures without strain and are 
representative of trends seen at all levels of strain. The expected average 
oxidation state given in a—d is sampled from 12 representative structural 
models of disordered-rocksalt LizMn2/3Nb1/3025, with an error bar equal 
to the standard deviation of this value. e, A schematic band structure of 
LipMn2/3Nb1/302F. 


of delithiation, Mn atoms with high F-coordination are less oxidized 
than those with low F-coordination, indicating that the substitution of 
O by F favours lower Mn oxidation states and thus leads to more redox 
overlap with oxygen. On the oxygen side, we observe more oxidation 
from O atoms with five and four Li nearest neighbours than those with 
three Li neighbours (Fig. 4d). This trend is consistent with previous 
theory and experiments that indicate that the lack of transition- 
metal-O hybridization in Li-rich environments increases the energy 
of some oxygen orbitals so that they can be more easily oxidized”***. 
Hence, the presence of Mn-F bonds, the Li-excess O environments 
and the larger bond distance of Mn-O(F) in this material all contri- 
bute to some competitive Mn/O oxidation at very high states of charge 
(Fig. 4e). Nevertheless, owing to the large Mn**/Mn** reservoir, O 
redox is much less needed in Lix:Mn/3Nb,/30F than in other Mn-based 
Li-rich materials, rendering Mn double redox an effective way to 
achieve high capacity without the typical polarization and capacity fade 
that is observed with excessive use of the oxygen redox. 


Structure and performance of LizMnq/7Tij/2O2F 

With diverse choices of high valent cations, Mn?*/Mn‘* double redox 
can be realized in many different systems. As a demonstration, we 
have developed another new material, LigMnj/2Tij/2O2E, in which 
Ti** is the high-valent cationic species. This material also forms a 
disordered-rocksalt phase (a = 4.206 A), as can be inferred from the 
XRD pattern in Fig. 5a (Extended Data Table 1). SEM shows that, 
after mixing the compound with carbon black using a shaker-mill, 
the average primary particle size is about 50nm (Fig. 5b). TEM-EDS 
shows a uniform distribution of Mn, Ti, O and F (Fig. 5c). As in the 
case of Li,.Mny/3Nb,/30K, the primary particles of LigMnj/2Tis/202F 
are polycrystalline and made of grains about 15nm across (Extended 
Data Fig. 9). The 7Li and ‘°F NMR results suggest the possible presence 
of impurities (for example Lik, Li,O, LixCO3) in our sample (Extended 
Data Fig. 1), but their amount is likely to be small as no crystalline or 
amorphous impurities could be observed with XRD and TEM (Fig. 5a, 
Extended Data Fig. 9). 

LiyMnj/2Tis/202F delivers high capacities similar to Liz¢Mn2/3Nb1/302F. 
When cycled between 1.6 V and 4.8 V (Fig. 5d) or 1.5 V and 5.0 V 
(Fig. 5e) at 20mAg_|, this material yields reversible capacities 
of 259 mAh g! (783 Whkg~!, 2,756 Wh17!) and 321 mAh g™! 
(932 Wh kg, 3,281 Wh17}), respectively. These values are again 
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Fig. 5 | Structural characterization and electrochemical performance 
of LipMnj/2Tis/2O2F. a, The XRD pattern of LipMn1/2Tiy202F. b, SEM 
image of LigMnj2Tij/2O2F after shaker-milling with carbon black for 
electrode fabrication. Scale bar, 100 nm. c, EDS mapping (Mn, Ti, O, F) on 
a Li,.Mnj2Ti/2O>F particle. Scale bar, 200 nm. d, e, Voltage profiles and 
capacity retention of Li;Mnj/2Tij;202F when cycled at 20mA g! between 


among the highest values achieved by advanced cathode materials'®””. 


Additional electrochemical data (rate tests, voltage window tests, 
change of the electrode formulation) are presented and discussed in 
Extended Data Fig. 10 and Methods section ‘Supplementary Note 5° 

As in LigMnz/3Nb,/302F, the voltage profiles of LigyMny/2Tiy/2O2F 
barely change after the first cycle, indicating a reversible reaction with- 
out a major structural change or O loss, as evidenced by the DEMS 
results (Extended Data Fig. 5). The O-oxidation plateau appears 
only after charging above 230 mAh g“! (above about 4.6 V), which is 
substantially delayed compared with other Mn-based Li-rich materials. 
Nevertheless, this plateau is slightly longer than in Liz:Mn/3Nb1/302F, 
probably owing to the smaller Mn?* content in LiyMnj2Tiy/202F, 
requiring more O oxidation to achieve a given capacity. 

Ex situ XRD of LigMnj/.Tij/2O2F indicates a reversible change in 
lattice parameter, shrinking from about 4.203 A to about 4.105A 
after a 400 mAh g_! charge and then recovering to about 4.194 A 
after a 330 mAh g“! discharge (Fig. 5f). Hard XAS confirms Mn?*/ 
Mn** redox in the material (Extended Data Fig. 11, Methods section 
‘Supplementary Note 6’). As in LiyMny/3Nbi/302E, additional capacity 
beyond Mn-redox capacity is probably delivered by O redox. Based on 
its high capacity and reversibility, LizyMnj/.Tij;2O2F has considerable 
potential as a high-performance Li-ion cathode. 


Outlook for Mn*+/Mn** redox 

Double redox couples are tremendously important for the development 
of advanced cathodes. Indeed, today’s modern NMC-based layered 
cathodes all rely to some extent on the Ni?*/Ni** double redox. With 
Lip2Mn2/3Nb1/302F and LigMn4/2Ti/202E, we have demonstrated that 
combined fluorination and high-valent cation substitution can intro- 
duce Mn*+/Mn** redox in a Li-excess disordered-rocksalt structure, 
which leads to high-capacity Mn-based Li-excess cathodes (capacity 
of >300 mAh g“', energy density of around 1,000 Whkg~') without 
an excessive use of O redox. This discovery is important, as our strategy 
can be widely applied to design high-performance Mn-based Li-excess 
cathodes that do not suffer from structural degradation triggered by 
extensive O redox. 
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1.6 V and 4.8 V (d), and 1.5 V and 5.0 V (e). Cycling tests were conducted 
at room temperature. f, The XRD patterns of LiyMnj/2Ti/2O2F during the 
first cycle at 10mAg ‘and the refined a-lattice parameters: before cycling; 
120 mAh g“! charged; 240 mAh g' charged; 400 mAh g charged; 
330 mAh g“! discharged after a 400 mAh g™! charge. 


The combination of Mn?+/Mn** redox with the cation-disordered 
structure®!32%2 and the partial replacement of O by F!5-!””? leads to 
a large chemical space for new cathode materials. We expect to see 
considerable optimization through the use of different high-valent 
charge-compensating elements, as well as through minor alloying addi- 
tions to stabilize the structure further and increase other performance 
aspects. The disordered-rocksalt framework has previously shown high 
structural stability®!1!?°°*, and its compositional flexibility, enabled 
by not requiring the preservation of the layered cathode structure, can 
be used to tune not only the Li-excess level for Li transport’?4, but also 
the content of F and high-valent cations (such as Sn**, Sb** and Te**). 
These handles can all be used to modify the size of the Mn**/Mn‘** 
reservoir and balance Mn- and O-redox activities. Critical directions 
for further research include finding ways of decreasing the voltage slope 
of these compounds, so that their high capacity and energy density can 
be delivered over a narrower voltage window”, as well as investigating 
Mn dissolution which often undermines the long-term cyclability of 
Mn-based cathodes*". Strategies based on compositional modifications 
of the cathode material, on changes in the short-range cation distribu- 
tion, on microstructure control (for example by surface coating)*! and 
on the use of tailored electrolytes*”? should be explored to further 
develop high-performance Mn**/Mn‘**-based cathodes for advanced 
Li-ion batteries. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper athttps://doi.org/10.1038/s41586- 
018-0015-4. 
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METHODS 

Synthesis. To synthesize LizMnz/3Nbj/302F, we used Li,O (Sigma-Aldrich, 97%), 
MnO (Alfa Aesar, 99%), Nb2Os (Alfa Aesar, 99.9%) and LiF (Alfa Aesar, 99.99%) as 
precursors. For LiyMnjy/2Tis/202F, we used Li,O (Sigma-Aldrich, 97%), MnO (Alfa 
Aesar, 99%), TiO» (Alfa Aesar, 99.9%) and LiF (Alfa Aesar, 99.99%) as precursors. 
Other than Li,O, for which we used 10% excess (rather than stoichiometric 
amount) to compensate for possible loss of LizO during synthesis, stoichiometric 
amounts of precursors were dispersed into (Ar-filled) stainless steel jars and then 
planetary ball-milled (Retch PM 200) for 40h at the rate of 450 r.p.m., during 
which LiyMno/3Nb1/302F or LixMnj/2Tis/202F form mechanochemically. The total 
amount of precursors in each jar (50 ml) was approximately 1 g, and five 10-mm- 
diameter and ten 5-mm-diameter stainless balls were used as the grinding media. 
Electrochemistry. To prepare a cathode film from LizMn2/3Nb,/302F or 
LigMnj/2Ti/202F, 480 mg of active compounds and 240 mg of carbon black 
(Timcal, SUPER C65) were first mixed for an hour in an Ar-filled 45-ml zirco- 
nia vial with 10 g of 5-mm-diameter yttria-stabilized zirconia balls (Inframat 
Advanced Materials) as grinding media, using a SPEX 8000M Mixer/Mill. 
Polytetrafluoroethylene (PTFE, DuPont, Teflon 8 A) was then added to the mixture 
as a binder, such that the cathode film consists of the active compounds, carbon 
black and PTFE in the weight ratio of 60:30:10. The weight ratio for cathode films 
with higher active-material loading was either 70:20:10 or 80:15:5. The components 
were then manually mixed using a mortar and pestle and rolled into a thin film 
inside an Ar-filled glove box. To assemble a cell for all cycling tests, 1 M of LiPF, 
in ethylene carbonate and dimethyl carbonate (EC/DMC) solution (1:1, BASF), 
glass microfibre filters (Whatman) and Li metal foil (FMC) were used as the elec- 
trolyte, the separator and the counter electrode, respectively. Coin cells (CR2032) 
were assembled in an Ar-filled glove box and tested on a Maccor 2200 or an Arbin 
battery cycler at room temperature in the galvanostatic mode otherwise specified. 
The loading density of the cathode film was about 6mgcm~*. The specific capacity 
was calculated on the amount of the active compounds in the cathode film. 
Characterization. XRD patterns of the as-prepared compounds and electrodes 
were collected on a Rigaku MiniFlex diffractometer (Cu source) in the 26 range 
of 5°-85°. Rietveld refinement was completed with PANalytical X pert HighScore 
Plus software. Elemental analysis on the compounds by Luvak Inc. was performed 
with direct current plasma emission spectroscopy (ASTM E 1097-12) for Li, Mn, 
Nb and Ti, and with an ion-selective electrode (ASTM D1179-10) for F. SEM 
images were collected on a Zeiss Gemini Ultra-55 Analytical Field Emission SEM 
in the Molecular Foundry at Lawrence Berkeley National Laboratory (LBNL). 
For TEM sampling, particles were sonicated with ethanol and drop-cast on an 
ultrathin carbon grid. Scanning TEM/EDS spectra were acquired from a few of 
the particles on a JEM-2010F microscope equipped with an X-max EDS detector 
in the Molecular Foundry at LBNL. 

Hard X-ray absorption spectroscopy. We performed Mn, Nb and Ti K-edge 
XANES measurements in transmission mode using beamline 20BM at the 
Advanced Photon Source. The incident energy was selected using a Si(111) mono- 
chromator. We performed the energy calibration by simultaneously measuring 
the spectra of the appropriate metal foil. Harmonic rejection was accomplished 
using a Rh-coated mirror. The samples for the measurements were prepared with 
the LiyMnz/3Nb,/302F and LizMnj/2Tij/202F electrode films before and after first 
charging and discharging to designated capacities. The loading density of the films 
was approximately 10 mgcm *. Additionally, we measured the spectra of some 
reference standards in transmission mode, to aid interpretation of the XANES 
data. Data reduction was carried out using the Athena software**. 

Soft X-ray absorption spectroscopy. Soft XAS measurements on the O K-edge 
were performed on Beamline 8.0.1.1 (iRIXS endstation) at the Advanced Light 
Source, LBNL**. All the O K-edge XAS spectra were normalized by incident 
beam flux monitored by a gold mesh, which was located in front of the ultra-high- 
vacuum experimental chamber. The energy resolution of the O K-edge XAS spectra 
was set to 0.2 eV and a reference (anatase) TiO2 O K-edge XAS spectrum was also 
recorded for careful energy calibration during the XAS experiments. XAS spectra 
taken in total fluorescence yield mode were chosen and presented as Fig. 3c in this 
paper to represent the bulk-like information (typically a few hundred nanometres 
below the sample surface) from these cathode materials. In addition, sample prepa- 
ration and handling for X-ray spectroscopy were done in an air-free environment 
to avoid surface contamination and oxidation. 

Differential electrochemical mass spectrometer measurement. A DEMS meas- 
urement was used to detect and quantify O2 and CO; gas evolved during charging 
and discharging (Extended Data Fig. 5). The custom-built DEMS and the cell 
geometry used are described in previous publications****. The electrochemical 
cells used with the DEMS device were prepared in a dry Ar glove box (<1 ppm 
O2 and H,O, MBraun USA, Inc.) using the modified Swagelok design and the 
same materials as discussed previously. The assembled cells were charged under a 
static head of positive Ar pressure (approximately 1.2 bar) after being appropriately 
attached to the DEMS. Throughout the charge, Ar gas pulses periodically swept 
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accumulated gases to a mass spectrometer chamber. The mass spectrometer abso- 
lute sensitivity has been calibrated for CO2 and O;, and therefore the partial pres- 
sures of these gases can be determined. The amount of CO, and O; evolved is then 
quantified based on the volume of gas swept to the mass spectrometer per pulse. 
Solid-state NMR spectroscopy. We acquired all 7Li and °F NMR data at room 
temperature on a Bruker Avance500 WB spectrometer (11.7 T), at Larmor fre- 
quencies of — 194.4 and —70.7 MHz, respectively. The data were obtained under 
50-kHz magic-angle spinning (MAS), using a 1.3-mm double-resonance probe. 
The chemical shifts of 7Li and !F were referenced against lithium fluoride powder 
(LiF, 6\0(7Li) = —1 ppm and 6j,9(!°F) = —204 ppm). 7Li spin echo spectra were 
acquired on as-synthesized LizMno/3Nb1/302F and LiyMnj/2Ti1/2O2F using a 90° 
radio-frequency (RF) excitation pulse of 0.9 js and a 180° RF pulse of 1.8 j1s at 
110 W. A recycle delay of 0.03 s was found to be sufficiently long to ensure com- 
plete relaxation of all Li signals between the excitation pulses. Lineshape analysis 
was carried out using the SOLA lineshape simulation package within the Bruker 
TOPSPIN software. Because the resonant frequency range of the !°F nuclei in the 
as-synthesized LiyMno/3Nb1/302F and LixMnj/2Tij/202F samples is larger than the 
excitation bandwidth of the RF pulse used in the NMR experiment, nine spin 
echo spectra were collected on each sample, with the irradiation frequency varied 
in steps equal to the excitation bandwidth of the RF pulse (330 ppm or 155 kHz). 
The individual sub-spectra were processed using zero-order phase correction so 
that the on-resonance signal was in the absorption mode. The nine sub-spectra 
were then added to give an overall sum spectrum with no further phase correction 
required. We note that this methodology, termed ‘spin echo mapping”, ‘frequency 
stepping“! or ‘VOCS’ (variable offset cumulative spectrum)”, is required to 
provide a large excitation bandwidth and uniformly excite the broad F signals. 
Individual '°F spin echo spectra were collected using a 90° RF excitation pulse of 
1.6 ps and a 180° RF pulse of 3.2 1s at 76.3 W (or 156 kHz), with a recycle delay of 
0.05 s. For comparison, a spin echo spectrum was collected on LiF using similar 
RF pulses but a longer recycle delay of 14s. A '°F spin echo spectrum, acquired 
under the same conditions as the Li2¢Mn/3Nb1/302F and LizMnj/2Tis2O2F spin 
echo spectra but on an empty rotor, revealed no significant background signal 
coming from the NMR probe itself. 

Density functional theory calculations. Density functional theory analysis of the 
redox mechanism was performed with the Vienna Ab-Initio Simulation Package 
(VASP)* using the projector augmented-wave method. First, structural models 
of the LizMn2/3Nb1/302F disordered rocksalt were obtained using a cluster- 
expansion-based Monte Carlo simulation, chosen to find low-energy disordered 
structures with representative short-range order, while suppressing phase separa- 
tion. The cluster-expansion Hamiltonian used for the Monte Carlo simulations 
consists of a decomposition of the internal energy of a particular atomic config- 
uration on a rocksalt lattice into contributions from two-, three-, and four-body 
terms up to maximum interaction distance of 7.0 A, 4.1 A and 4.1 A respectively, 
relative to an ideal rocksalt lattice with a primitive lattice constant of 3.0 A, on top 
of an electrostatic model based on the formal charges of all species“. To obtain 
the interaction terms, we first calculated 450 representative configurations of Li‘, 
Mn?+, Nb>+, O2- and E~ ona rocksalt lattice within the Perdew-Burke-Ernzerhof 
exchange-correlation functional’, supplemented with the rotationally invariant 
Hubbard U correction® to the transition metal d states to correct self-interaction 
error (Un =3.9 eV, Unp = 1.5 eV based on previously reported fits to oxide for- 
mation enthalpies*’). These calculations were performed with a reciprocal-space 
discretization of 25 A~!, 520 eV plane-wave cut-off, and a 10~* eV and 0.02eV A-? 
convergence on total energy and interatomic forces respectively. The strength of 
each cluster interaction, as well as the dielectric constant, was then fitted using 
a L}-regularized least-squares regression, optimized by cross-validation, which 
resulted in an out-of-sample error of 9 meV per atom. 

The redox mechanism of LizMn/3Nb1/302F was calculated on 12 structural 
models of 36 atoms each, obtained from the Monte Carlo simulations described 
above. Oxidation calculations were done using the hybrid Heyd-Scuseria— 
Ernzerhof functional”, using a 650-eV plane-wave cut-off, 10 A~! reciprocal-space 
discretization, and a 10~° eV and 0.02 eV A“! convergence on total energy and 
interatomic forces respectively. The fraction of exact exchange was set to 0.30 on 
the basis of a calibration to the Kohn-Sham gaps of «-Mn3Ox4, \-MnOOH and 
B-MnO, calculated within the Gp Wo approximation, following previously reported 
methodology for reproducing the redox competition between transition metals 
and oxygen”. To investigate the order in which various redox couples are activated 
in the material, suppressing major structural rearrangements, we trace the oxida- 
tion state of each species (obtained from the magnetic moment of each atom) as 
electrons are removed from the material and charge compensated by a uniform 
background charge™, allowing the local atomic arrangements to relax at each step 
but keeping the lattice fixed. As the order of oxidation reactions is determined by 
the character of the valence-band edge at various states of charge, such electron 
titration provides an efficient way to look at the electronic contribution to the 
redox mechanism. 
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Supplementary Note 1. The absence of peaks other than those corresponding 
to the disordered-rocksalt phases in the XRD data collected on as-synthesized 
LigMny/3Nb1/302F and LiyMnj2Ti}/202F suggests that the samples are fairly phase- 
pure without large amounts of crystalline impurities such as LiF, LiyO, or LixCO3. 
Nevertheless, because small amounts of impurity phases or amorphous phases can 
be invisible to XRD, we investigated further using ’Li and '7F NMR spectroscopy. 

7Li spin echo NMR spectra, obtained on as-synthesized LiyMnz/3Nb1/302F and 
Li2Mnj/2Tij/2O2F samples, are shown in Extended Data Fig. 1a, b. The data have 
been fitted using a minimum of three Li sites: Lil, Li2 and Li3. The fits suggest 
about 78%-79% (+2%) of Li in paramagnetic environments (Lil and Li2 signals), 
and about 21%-22% (42%) Li in diamagnetic environments (Li3 signal). The 
broad Lil and Li2 resonances are ascribed to several paramagnetic Li environ- 
ments close to open-shell Mn and with very similar shifts, resulting in overlapping 
signals. Paramagnetic interactions between unpaired Mn d electrons and the Li 
nuclei lead to a broadening of the individual Li signals with shifts centred around 
64.9 ppm (Lil) and —27.4 ppm (Li2) for LigMnj/2Tiy/202F, and around 57.7 ppm 
(Lil) and —25.5 ppm (Li2) for LizMng/3Nb1/302F. The sharper Li3 resonance, 
with a shift close to 0 ppm (—0.1 ppm for LigMnj,2Tis/202F and —0.5 ppm for 
LipyMnz2/3Nb,/302F), is ascribed to diamagnetic Li sites in the samples. Because 
Lit, Nb** and Ti** do not have unpaired electrons, Li nuclei in diamagnetic 
Li/Ti- and Li/Nb-rich domains in LizMnq/2Ti,/202F and LizMn2/3Nb,/302F have 
a shift around 0 ppm that cannot be distinguished from that of potential Li,O, 
LiF and Li,CO; impurities, resonating at 2.8, —1 and 0 ppm, respectively*!. All 
of these Li environments may contribute to the Li3 signal, and individual con- 
tributions cannot be quantified. In fact, local segregation of cations that would 
lead to Li/Ti- or Li/Nb-rich domains in our compounds has been observed in 
several compounds—for example, LixMnO3-like domains in Li- and Mn-rich 
layered Ni-Mn-Co materials”, or Li;NbO,-like local domains in disordered Li-V- 
Nb-O materials°”. A previous 7Li NMR study on paramagnetic Li transition metal 
phosphates (LiMPO,) cathodes has found that paramagnetic shift contributions 
from distant M beyond the second metal coordination shell around the central 
Li can be non-zero”. This observation suggests that, in LiyMnj/2Tij/2O2F and 
LigyMng/3Nb1/302F, Mn is likely to be more than 7A away from the Li for there 
to be no paramagnetic shift contribution and an overall Li shift close to 0 ppm. 

19F spin echo sum spectra, collected on as-synthesized Li,Mny/3Nb,/302F and 
LigMnq/2Ti1/202F, are compared to the spin echo spectrum collected on crystalline 
LiF powder and presented in the Extended Data Fig. 1c. Further details on how 
the sum spectra were obtained can be found in the Methods section for solid state 
NMR spectroscopy where we describe the method of ‘spin echo mapping’ The °F 
NMR data clearly indicate that most of the F is found in paramagnetic environ- 
ments (that is, with Mn in the first, second and/or third metal coordination shell 
around the F nucleus), giving rise to very broad overlapping NMR signals shifted 
away from the LiF resonant frequency. Nevertheless, LiF-like F environments are 
also observed as a sharp signal with a resonant frequency equal to that of pure LiF 
(—204 ppm). Some of our current work on similar paramagnetic cation-disordered 
oxyfluorides suggests that F nuclei directly bonded to the paramagnetic centre 
(here Mn) are essentially invisible in the NMR spectrum, because the very strong 
interaction with the unpaired electrons leads to extremely broad resonances with 
avery large shift that are lost in the background. Hence, we suspect our !7F NUR 
data not to be quantitative and the proportion of F in paramagnetic environments 
to be even larger than that determined from experimental observations. With this 
in mind, the °F NMR data confirm that most of the F has integrated into the bulk 
cation-disordered oxide lattice. Although the —204 ppm !°F signal can indicate 
LiF impurity in our samples, it can also indicate the presence of a small propor- 
tion of LiF-like domains in the disordered oxyfluoride structure, which would 
be consistent with recent theory work!» showing that the much higher energetic 
cost of creating M-F bonds, as compared with Li-F bonds, results in the incor- 
poration of F in Li-rich (that is, LiF-like) local environments in cation-disordered 
oxyfluoride materials. 

In short, diamagnetic Li sites and LiF-like F environments observed with NMR 
cannot be uniquely attributed to either local domains in the rocksalt structure 
or to amorphous impurity phases, such as Lik, Li,O or LizCOs, in our samples. 
Hence, NMR can give us an upper bound to the amount of impurity present in the 
samples but does not enable us to obtain the exact amount of potential impurity 
phases. In the extreme case in which all of the diamagnetic Li and LiF-like F signals 
come from LijO and Lif, the total weight fraction of impurity phases is estimated 
to be no more than 6-7 wt%; it is likely to be less, as no crystalline impurities 
were observed with XRD and no amorphous domains were observed in TEM 
(Extended Data Figs. 2, 9). As a result, we are confident stating that the perfor- 
mance of LiyMn/3Nb1/302F and LizMnj/2Tis/202F is predominantly determined 
by the transition metal oxyfluoride rocksalt phase. 

Supplementary Note 2. Extended Data Fig. 5a, b shows the O> (g) and CO; (g) 
evolution data from LiyMn2/3Nb1/302F and LiyMnj/2Tij/20>F during initial charge 
(1.5-5.0 V, 20mAg™!), collected by DEMS measurements. The capacity observed 


during this DEMS test is slightly smaller than that in a coin cell test, because the 
electrode films were made thicker (about 13 mgcm~? versus about 6mgcm~? in 
coin cells) for this measurement to enhance gas evolution signals. We detect neg- 
ligible O2 (g) evolution from both compounds upon first charging to 5 V. The total 
amount of O; (g) evolved during the first charge is smaller than 0.01 mol mg! 
(of active material) for both LiyMnz/3Nb,/302F and LizMnj/2Ti,/202K, which corre- 
sponds to less than 0.2% of total oxygen content in the two materials. For conven- 
tional layered Li- and Mn-rich cathodes, such as Li; 2Nip.13C00.13Mno 5402, oxygen 
loss occurs dominantly in the form of O2 (g) evolution which starts from above 
4.5 V in the first charge, and results in a loss of about 4%-5% of the total oxygen 
content of the cathode materials!°”°. Therefore, the remarkably small amount 
of O; (g) evolved even until 5 V demonstrates negligible oxygen loss from both 
LipyMnz/3Nb1/302F and LipMnj/2Tiy202F compounds. 

Interestingly, we detect a noticeable amount of CO) (g) evolved from the 
two materials (0.30 mol mg ! and 0.24 .mol mg! for LizgMn2/3Nb1/302F and 
LipyMnj/2Tis/202F, respectively), with much of the evolved CO, coming at lower 
voltages than the threshold voltage (about 4.5 V) for decomposition of 1 M LiPF, in 
EC/DMC electrolyte”. On the basis of an acid titration test using 1M H,SO,5*9, 
we find that most of this CO) (g) is likely to come from electrochemical decomposi- 
tion of surface carbonates (for example solid lithium carbonate) that probably form 
during the shaker-milling process between the active compounds and carbon black. 
For instance, Extended Data Fig. 5c shows the cumulative CO; evolution during 
acid titration on shaker-milled LizMn,/2Tij/202F and carbon black mixture. CO, 
(g) evolves from the mixture immediately after adding 1 M H2SOx, with a total CO) 
amount of about 0.17 j1mol per mg of LiyMnq/2Tij/2O2F. This direct CO. evolution 
indicates a chemical decomposition of an equimolar amount of carbonate species 
by the H2SO, addition (about 0.7 wt% of the powder mixture, assuming LixCO3 as 
the carbonate species), which can also decompose electrochemically. Since 5 V is 
a high enough voltage to electrochemically decompose carbonates**”°, we expect 
that a similar amount of CO; (g) to that in the acid titration would evolve from 
the surface carbonates during charging of the LixMnj2Ti}/20F electrode, which 
would imply that about 71% (about 0.17 jzmol mg out of 0.24j1mol mg") of the 
CO, evolved during the first charge originates from carbonate decomposition. 

It is worth noting that some sub-surface LizCO3 may not be detected using our 
acid titration method, although this carbonate may still oxidize to CO, during the 
first charge cycle such that carbonate oxidation accounts for nearly all CO, evolu- 
tion observed in Extended Data Fig. 5a, b. From the many transition metal oxides 
studied using our gas evolution methods (for example, Ni-rich and Li/Mn-rich 
Ni-Mn-Co oxides)*, residual LixCO3, and not electrolyte degradation (below 
4.8 V), has accounted for all CO, evolution during the first charge cycle, and it is 
likely that a similar phenomenon is observed for these materials. Nevertheless, it is 
possible, although less likely, that the additional CO, evolved beyond that expected 
from the titrated Li,CO3 quantity may come from direct electrolyte decomposition, 
particularly at high voltages (>4.8 V), or from some oxygen species evolved from 
the materials reacting with the electrolyte?®?79>°°, 

Supplementary Note 3. Most of the Li-excess Mn-rich cathodes using high lev- 
els of oxygen redox experience voltage fading, a continuous reduction of both 
charge and discharge voltages upon extended cycling'®°”. From the evolution of 
average voltages upon cycling (Extended Data Fig. 6), we find that voltage fad- 
ing for LizMn /3Nb/302F is less pronounced than for other Mn-rich cathodes. 
Comparing the 2nd and 20th cycles between 1.5 and 4.6 V, 1.5 and 4.8 V, and 1.5 
and 5.0 V at 20mAg_|, we observe a decrease of the average discharge voltage 
by approximately 1.3%, 2.2% and 4.0%, respectively. Apparently, a higher charge 
cut-off voltage results in more reduction of discharge voltage upon cycling. On the 
contrary, the average charge voltage increases by about 1.8%, 1.3%, and 1.9%, when 
comparing the 2nd and 20th cycles between 1.5 and 4.6 V, 1.5 and 4.8 V, and 1.5 and 
5.0 V, respectively. In fact, half of the average charge-discharge voltages ((charge 
voltage + discharge voltage)/2) changes only about 0.3%, 0.27% and 0.7%, when 
comparing the 2nd and 20th cycles between 1.5 and 4.6 V, 1.5 and 4.8 V, and 1.5 
and 5.0 V, respectively. For the Mn-rich Li-excess materials that experience voltage 
fading, both the discharge and charge voltages decrease upon cycling'®°”. Our 
result, on the other hand, shows slight decrease of discharge voltage but increase 
of charge voltage, and the average of the two barely changes. This indicates that 
the voltage change for LixMnz/3Nb1/30>F is unlikely to be the result of irreversi- 
ble voltage fading but is the result of impedance growth such as from electrolyte 
decomposition at high voltages above 4.5 V*. 

Supplementary Note 4. We performed hard XAS on the LizMnz/3Nb /302F 
material. Along with rising edges (Extended Data Fig. 7a), pre-edge features of 
XANES spectra (Extended Data Fig. 7b) can give information about oxidation states. 
The pre-edge feature (about 6,539 eV) in the Mn K-edge XANES spectra originates 
from electron excitation from the Mn 1s state to mixed Mn 3d-4p states, allowed 
in a non-centrosymmetric environment**. Direct comparison of the Mn K-edge 
pre-edge features of LizMn /3Nby/302F upon cycling is shown in Extended Data 
Fig. 7b. To analyse their shape more clearly, first derivatives of their pre-edges are 
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shown in Extended Data Fig. 7c-e. The first derivatives of the spectra from ‘before 
cycle’ and ‘after first charging to 375 mAh g’ then discharging to 320 mAh g 
resemble that of MnO (Extended Data Fig. 7c), suggesting that most Mn ions in 
the two samples are in the Mn?" state. After first charging LiyMnz/3Nby/302F to 
135 mAh g |, the derivative shape looks similar to that of Mn2Os, indicating the 
presence of Mn** in the sample (Extended Data Fig. 7d). After charging to 270 
and 360 mAh g', the derivative shape changes towards that of MnOn, indicating 
mostly Mn** ions although Mn** and Mn?* might also be present (Extended 
Data Fig. 7e). 

Supplementary Note 5. LizMnj2Ti;/202F exhibits promising cycling behaviour, 
as does LiyMno/3Nb,/302F. When cycled between 1.6 V and 5.0 V (Extended Data 
Fig. 10a), 2.0 V and 4.8 V (Extended Data Fig. 10b), and 2.3 V and 4.6 V (Extended 
Data Fig. 10c) at 20mA gt, the 60 wt%:30 wt%:10 wt% = LizMnj/2Tij/202F: 
carbon black:PTFE electrode delivers discharge capacities up to 306 mAhg =! 
(920 Whkg”'), 227 mAh g_! (739 Whkg_!), and 160 mAh g | (534 Whkg"!), 
respectively. Rate capability of LiyMnj/2Tij/2O2F is acceptable. When cycled 
at high rates of 200 and 400 mA g“! between 1.6 V and 5.0V, the material still 
delivers discharge capacities up to 210 mAh g ' (629 Whkg') and 158 mAh g! 
(461 Whkg“) (Extended Data Fig. 10d). Capacity retention of LizMnq.Ti1/202F 
(Extended Data Fig. 7e) is good and is slightly better than that of LixMnz/3Nb,/302F 
(Extended Data Fig. 4). When cycled at 100 mA g ! and above, the capacity loss 
during initial 25 cycles is less than 0.4% per cycle. The 80 wt%:15 wt%:5 wt% = 
LiyMny/2Tij/202F:carbon black:PTFE electrode exhibits similar performance to the 
60 wt%:30 wt%:10 wt% = LipyMnj/2Ti}/202F:carbon black:PTFE electrode (Extended 
Data Fig. 10b, f). 

Supplementary Note 6. Extended Data Fig. 11a, b shows the Mn K-edge XANES 
spectra of LiyMnj/2Ti}/20>F before cycle, after charging to 120 mAh g~ 1240 mAh g! 
and 400 mAh g”|, and after charging to 400 mAh g™! then discharging to 
330 mAhg'. Upon first charging from 0 to 120 mAh g and 240 mAh g“, the 
Mn rising-edge shifts, from an energy in between those in MnO and Mn,0;, to 
an energy in MnO; and then partway up to an energy in MnO). Further change is 
small upon charging to 400 mAh g !. The edge returns to the original position after 
discharging to 330 mAh g"!. This result suggests that Mn ions in the as-prepared 
LigMnj/2Tij/202F compounds are mostly Mn** (possibly with some Mn**), which 
are oxidized in charge towards Mn*+ with some Mn ions not fully oxidized. Upon 
discharge, Mn ions return to Mn**. Note that because the shape of Mn K-edge 
spectra for a given oxidation state can vary a lot depending on bonding environ- 
ment*®, and there are no reported references for Mn-based disordered-oxyfluoride 
compounds, quantitative analysis of our results is difficult. 

Derivative analysis on the Mn pre-edge feature at about 6,539 eV (Extended Data 
Fig. 11c—e) suggests the same Mn-redox mechanism. The first derivatives of the 
spectra from ‘before cycle’ and ‘after first charging to 400 mAh g ! then discharging 
to 330 mAh g ’ exhibit a mixed shape of the first-derivative spectra of MnO and 
Mn,O; (Extended Data Fig. 11c). This suggests an existence of Mn?* ions with 
some partly oxidized Mn ions such as Mn**. After first charging to 120 mAhg}, 
the derivative shape looks similar to that of Mn.O3, indicating Mn** in the sample 
(Extended Data Fig. 11d). After charging to 240 and 400 mAh g“1, the derivative 
shape changes towards that of MnO;, suggesting a large amount of Mn** ions, but 
Mn*+ and Mn** might also be present (Extended Data Fig. 11e). 

The Ti K-edge spectra of the LizMnj/2Ti;/2O2F samples (Extended Data Fig. 11f) 
resemble that of TiO (Ti**) and their rising-edge position barely changes during 
cycling, although there are minor changes in shape, which indicates local Ti-site 
distortion®’. This suggests that Ti exists as Ti** and is redox-silent. Because Ti** 
is redox inactive, we expect that reversible capacities of LiyMnj/2Tij/202F beyond 
Mn capacities come from O redox, as in the case of LizMnz/3Nby/302F. 

Data availability. The datasets generated and analysed during this study are avail- 
able from the corresponding authors on reasonable request. 
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Extended Data Fig. 1 | Solid-state NMR spectroscopy results. a, b, “Li 
spin echo NMR spectra acquired on as-synthesized LiyMn2/3Nb1/302F (a) 
and Li,Mnj/.Ti,/,02F (b) powders at 50 kHz MAS at a field By = 11.7 T. 
The data have been fitted with a minimal number of Li sites: Lil, Li2 and 
Li3. Spinning sidebands of the three Li signals are indicated with asterisks. 
c, °F spin echo sum spectra acquired on as-synthesized LizMnj/3Nb,/302F 


T T T 
-500 -1000 -1500 


and Li,Mnj/2Ti,/202F powders at 50 kHz MAS at a field By = 11.7 T. The 
spectra are compared to the spin echo spectrum collected on LiF under 
similar conditions. Spinning sidebands of the sharp LiF-like signals are 
indicated with asterisks. Detailed explanations of the results are given in 
Methods section ‘Supplementary Note 1’ 
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Extended Data Fig. 2 | Structural characterization of LizMnz/3Nb1302F. —_ Scale bar, 5nm_!. d, Fast Fourier-transformed (FFT) images of the dotted 


a, TEM image of as-synthesized Li,Mn2/;Nb,/302F particles. Scale bar, squared areas in b. e, The high magnification image across the squared 
50nm. b, A high-magnification TEM image of the area enclosed in a areas 1, 2 and 3 in b. Scale bar, 5nm. We can clearly observe lattice fringes 
square in a. Scale bar, 10 nm. The yellow circle indicates the boundary of and FFT peaks throughout the particle, indicating that our particles are 
one of the many grains in the polycrystalline LiyMn2/3Nb,/302F particle. made of small crystalline grains instead of amorphous phases. 


c, An electron diffraction pattern of the LizMn2/3Nb,/302F particle. 
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Extended Data Fig. 3 | Additional electrochemical data from 
LipMnz/3Nbj/302E. a, b, Voltage profiles of the 60:30:10 electrode (that 
is, 60 wt% LiyMny/3Nb1/302F: 30 wt% carbon black: 10 wt% PTFE) when 
cycled between 2.0 V and 4.8 V (a), and 2.3 V and 4.6 V (b) at20mAg™!. 
c, d, Voltage profiles of the 70:20:10 (c) and the 80:15:5 (d) electrodes, 
when cycled between 1.5 V and 5.0 V at 20mAg ’. e, Voltage profiles of 


400 


om 


(60:30:10) LizMna3Nb1/302F 


5 
= 4 
o 
dD) 
£ 3 
lo) 
> 
2 Cycle number 
180 mAh g" 
qt (600 Wh kg’) _2.3 - 4.6 V, Room T, 20 mA g' 
0 100 200 300 400 


Specific capacity (mAh g’') 


ok 


(80:15:5) LizMn2sNb1302F og, 


<< “- Charge 
m * -4- Discharge 
oO 
dD) 
£ 3 
2) 
> 
2 


Cycle number 


292 mAh g'' 

_(920 Wh kg") _1.5-3.0V; Room T, 20 mAg" 
0 100 200 300 400 
Specific capacity (mAh g') 


— 


LizMna3Nb1/302F — 60:30:10 


Voltage (V) 


SS 


1st discharge, 1.5 - 5.0 V, Room T, 20 mA g' 


0 100 200 300 
Specific capacity (mAh g’') 


the 80:15:5 electrode when cycled between 2.0 V and 4.8 V at 20mAg™!. 

f, A comparison of the first discharge profiles of the 60:30:10, 70:20:10 and 
80:15:5 LiyMn2/3Nby/302F electrodes (1.5-5.0 V, 20mA g"'). The specific 
capacity was calculated on the amount of the LixMn2/3Nb1/302F powder in 
the cathode film. 
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Extended Data Fig. 4 | Discharge capacity retention. The 60:30:10 25 cycles. This is likely to be due to electrolyte decomposition per cycle 
LiyMny/3Nb,/302F: carbon black:PTFE electrode was cycled between occurring more (less) at a high voltage in a slower (faster) cycling test, 

1.5 V and 5.0 V at room temperature at 10, 20, 40, 100, 200, 400 and which increases the impedance of a cell by creating a resistive surface layer 
1,000 mA g™!. A faster rate leads to less capacity fading during the initial and decreasing the ionic conductivity of the electrolyte. 
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powder mixture, as a function of time during an acid titration test using 
1M H)SO,. Detailed explanations of the results are given in Methods 


section ‘Supplementary Note 2. Ist c, first charge. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


L 


(OL X) eynulw Jad peajons Bw jown 


© 


ARTICLE 


p-b-h-h-4-A-A 
ba pbb b-bd p- bbb AA AE 


A 48V.c : 
NAA A-AcA A ADA A- DALAL A 


4.6 V,c 
x _p-h-h-h-d-b-d-b-a-A& 


_A- 
oa-6-0-6-4-6-6 ss -0-@-@-0-0-@-@-8-@-@ 
5.0 V, (cr sey 


O 4.8 V, (ctdc)/2 
~O-O-O-O0-O-0-0-O0-0-0-0-0-O-O0-O0-O-0-O0-O 


4.6 V, (c+dc)/2 


Average voltage (V) 


1 3 5 ¢7 9 11 13 15 17 19 
Cycle number 


Extended Data Fig. 6 | Evolution of the charge and discharge voltages. is cycled between 1.5 V and 4.6 V, 1.5 V and 4.8 V, and 1.5 V and 5.0V, 
Average charge voltage (triangles), discharge voltage (stars), and half of at 20mA g '. Detailed explanations of the results are given in Methods 
the charge—discharge voltage (circles) are shown when Li,.Mnj/3Nb,/302F section ‘Supplementary Note 3° c, charge; dc, discharge. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


a 


Mn K-edge 


— Before 
—135:c 


Normalized absorbance (a.u.) 


6540 


6550 
Energy (eV) 


6560 


ion 


Mn K-edge ane 


—— Before 
——135.¢ 


Normalized absorbance (a.u.) 


6535 6536 6537 6538 6539 6540 6541 6542 
Energy (eV) 


Extended Data Fig. 7 | XANES of Li.Mn2/3Nb,/302F. a, b, Manganese 
K-edge XANES spectra of LiMn /3Nb1/302F: before cycle, after first 


charging to 135 mAh g“!, 270 mAh g ‘and 360 mAh g "’, and after first 


charging to 375 mAh g ' then discharging to 320 mAh g '. c-e, First 


derivatives of normalized absorbance at the pre-edge region of Mn K-edge 
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spectra of LixMn/3Nb1/302F: ¢, before cycle and after first charging to 

375 mAh g! then discharging to 320 mAh g”!; d, after first charging to 
135 mAh g |; ande, to 270 mAh g! and 360 mAh g |. Data from MnO, 
Mn,O3 and MnO, are presented for comparison. Detailed explanations of 
the results are given in Methods section ‘Supplementary Note 4. 
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Extended Data Fig. 8 | Niobium K-edge XANES spectra of K-edge XANES spectra of the LizMn2/3Nb,/302F samples are similar to 
Li,Mn./3Nb1/302F obtained by hard XAS. Results are shown before that of Nb.O; (Nb** reference), indicating that Nb in the compound stays 
cycle, after charging to 135 mAh g',270 mAhg ' and 360 mAhg',and —_as Nb°* during cycling. The observable small shape changes are likely to 
after charging to 375 mAh g ' then discharging to 320 mAh g_!. The Nb be related to changes in local disorder and distortion®. 
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Extended Data Fig. 9 | Structural characterization of LizMnj/2Ti,/202F. 
a, TEM image of as-synthesized LigMnj/2Tij/2O>F particles. Scale bar, 
50nm. b, A high-magnification TEM image of the area enclosed in a 
square in a. Scale bar, 10 nm. The yellow circle indicates the boundary 


of one of the many grains in the polycrystalline LigMnj/2Tij/2O>F particle. 


c, An electron diffraction pattern of the LizMnj/2Tij/2O2F particle. 


1.1A (400) 
1.3A (311), (222). 


Scale bar, 5nm~!. d, FFT images of the dotted squared areas in b. e, The 
high magnification image across the squared areas 1, 2 and 3 in b. Scale 
bar, 5nm. We can clearly observe lattice fringes and FFT peaks throughout 
the particle, indicating that our particles are made of small crystalline 
grains instead of amorphous phases. 
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Extended Data Fig. 10 | Electrochemical properties of LizMnj/2Tij/2.02F. 
a-c, Voltage profiles and capacity retention of the 60:30:10 
LigMnq/2Ti,/202F:carbon black:PTFE electrode when cycled at 20mAg™! 
at room temperature between 1.6 V and 5.0 V (a), 2.0 V and 4.8 V (b), and 
2.3 V and 4.6 V (c). d, The initial charge-discharge profile of the 60:30:10 
electrode when cycled between 1.6 V and 5.0 V at room temperature at 
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20, 40, 100, 200, 400 and 1,000 mA g_'. e,The discharge capacities during 
initial 25 cycles. f, Voltage profiles and capacity retention of the 80:15:5 
electrode when cycled at 20mA g‘ at room temperature between 2.0 V 
and 4.8 V. The specific capacity was calculated on the amount of the 
LigMnj/2Tis/202F powder in the cathode film. Detailed explanations of the 
results are given in Methods section ‘Supplementary Note 5. 
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Extended Data Fig. 11 | XANES of LizMnj/2Ti,/202F. a, b, Manganese 
K-edge XANES spectra of LigMnj/2Tij/202F: before cycle (black), 

120 mAh g™! charged (navy), 240 mAh g"! charged (wine), 400 mAh g! 
charged (grey), 330 mAh g ' discharged after a 400 mAh g“' charge (dark 
yellow). c-e, First derivatives of normalized absorbance at the pre-edge 
region of Mn K-edge spectra of LizMnj/.Tis/2O2F: ¢, before cycle and after 


first charging to 400 mAh g“! then discharging to 330 mAh g™}; d, after 
first charging to 120 mAh g~'; and e, to 240 mAh g! and 400 mAhg"!. 
f, Titanium K-edge XANES spectra of LigMnj/2Tij/2O2F during the initial 
cycle. Data from MnO, Mn.03, MnOo, Ti2O3 and TiO; are presented for 
comparison. Detailed explanations of the results are given in Methods 
section ‘Supplementary Note 6. 
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Extended Data Table 1 | Structural parameters from the Rietveld refinements 
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The Rietveld refinements are shown in Figs. 1b and 5a. The crystallographic information file of Fm3m LiFeO> (ICSD collection code 51208) was used as an input file. A pseudo-Voigt fit was used 
(U,V,W= 8.0691, —0.9697, 1.3778 for Li2Mn2/3Nb1/30eF, and 5.8736, —1, 1.4118 for LigMnj,2Tii/202F). The atomic occupancies were initially set to the atomic ratio obtained from elemental analysis 
by direct-current plasma emission spectroscopy and an ion-selective electrode, based on which the lattice parameters were first refined. We then further refined the lattice parameters and the atomic 
occupancies together. Transition-metal occupancies were first refined freely. Then O and F occupancies were individually refined with a constraint of their occupancies summing to 1. Finally, all atomic 
occupancies including Li occupancy were simultaneously refined with the additional constraint that the total transition-metal occupancy should stay unchanged during this final process. However, 

as O and F are difficult to distinguish by XRD, and Li cannot be seen clearly, their occupancy values are more subject to error. 
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Extended Data Table 2 | Target versus measured atomic ratio of LizMnz/3Nb1/302F and LigMny,2Ti1/202F compounds 


Liz2Mn23Nb1/302F Li2Mny2Ti1202F 
(Li: Mn: Nb: F) (Li: Mn: Ti: F) 


2: 0.666 : 0.333: 1 22052 035.= 1 


Materials 


Target atomic ratio 


Measured atomic ratio 1.852 : 0.660 : 0.333 : 1.05 2.01 0.514 -0475: 1.05 


Measurements were made by direct-current plasma emission spectroscopy (Li, Mn, Nb, Ti) and with an ion-selective electrode (F). 
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Observed fingerprint of a weakening 
Atlantic Ocean overturning circulation 


L. Caesar?*, §. Rahmstorf!2*, A. Robinson!**°, G. Feulner! & V. Saba® 


The Atlantic meridional overturning circulation (AMOC)—a system of ocean currents in the North Atlantic—has a 
major impact on climate, yet its evolution during the industrial era is poorly known owing to a lack of direct current 
measurements. Here we provide evidence for a weakening of the AMOC by about 3 +1 sverdrups (around 15 per cent) since 
the mid-twentieth century. This weakening is revealed by a characteristic spatial and seasonal sea-surface temperature 
‘fingerprint’ —consisting of a pattern of cooling in the subpolar Atlantic Ocean and warming in the Gulf Stream region— 
and is calibrated through an ensemble of model simulations from the CMIP5 project. We find this fingerprint both 
in a high-resolution climate model in response to increasing atmospheric carbon dioxide concentrations, and in the 
temperature trends observed since the late nineteenth century. The pattern can be explained by a slowdown in the AMOC 
and reduced northward heat transport, as well as an associated northward shift of the Gulf Stream. Comparisons with 
recent direct measurements from the RAPID project and several other studies provide a consistent depiction of record- 


low AMOC values in recent years. 


The AMOC is one of Earth’s major ocean circulation systems, redis- 
tributing heat on our planet and thereby affecting its climate. At the 
same time, it is a highly nonlinear system with a critical threshold, 
depending on a delicate balance of temperature and salinity effects 
on density, and is considered one of the main tipping elements of the 
Earth system!”. Changes in Atlantic overturning have been respon- 
sible for some of the strongest and most rapid climate shifts during 
the Quaternary Period (the past 2.6 million years)? . These historical 
changes in the AMOC have not only affected the North Atlantic and 
surrounding landmasses, but have also had global impacts. For exam- 
ple, a slowdown of the AMOC is associated with a southward shift of 
the tropical rainfall belt and a warming of the Southern Ocean and 
Antarctica (the ‘see-saw’ response)**. 

Given the potentially disruptive impact of a major change in the 
AMOC, it is imperative to better understand whether and how the 
AMOC is responding to modern anthropogenic warming. Direct con- 
tinuous measurements of the AMOC have only been available for a little 
over a decade and are therefore probably dominated by natural vari- 
ability*. The longer-term evolution of the AMOC needs to be recon- 
structed from indirect indicators. Based on the observed cooling trend 
in the subpolar Atlantic since the early twentieth century, recent studies 
have suggested that the AMOC may have slowed over this period>’. 
However, it has also been suggested that another mechanism could 
explain the subpolar Atlantic cooling, for example, the increasing aer- 
osol load of the atmosphere’. 

Here we use the latest high-resolution climate model results to iden- 
tify a characteristic sea-surface temperature (SST) fingerprint, con- 
sisting of a cooling in the subpolar gyre region and a warming in the 
Gulf Stream region, which in the climate model is associated with an 
AMOC reduction in response to rising atmospheric carbon dioxide 
(CO,) levels®. We then compare this fingerprint with the observed SST 
evolution since the late nineteenth century, including consideration of 
the seasonal cycle. We use the climate-model ensemble of the Coupled 
Model Intercomparison Project Phase 5 (CMIP5) to test and calibrate 


a revised AMOC index, and we present a new reconstruction of the 
AMOC evolution for the period 1870 to 2016. This index reaches 
record-low values in the past few years and, for the periods of overlap, 
is consistent with direct measurements, reanalysis data of the AMOC 
since 1995 and other AMOC studies. 


Comparing climate model and SST observations 

We use the CM2.6 coupled global climate model, which provides high 
horizontal resolution of around 50 km in the atmosphere and 10 km 
in the ocean (see Methods). The latter is important for analysing SST 
data because high resolution helps to reduce regional SST biases!°. The 
model resolves mesoscale ocean eddies'’ and shows a more realistic 
simulation of the Gulf Stream relative to coarser model versions. In par- 
ticular, this model practically eliminates a bias in the separation point 
of the Gulf Stream from the United States’ coastline (leading to a warm 
and salty bias along the continental shelf), which is common in coarser 
climate models assessed by the Intergovernmental Panel on Climate 
Change (IPCC)*. After appropriate spin-up, we used two simulations: a 
control simulation of 80 years’ duration with CO, concentrations fixed 
at the 1860 level, and a run in which atmospheric CO increased by 1% 
per year over 70 years until it doubled, and then remained at this level 
for another 10 years. 

Figure 1 shows the linear trend in SST over the ‘CO -doubling’ 
experiment and the corresponding control run, compared with the 
observed trend from 1870 to 2016 (owing to the extreme computa- 
tional costs of the CM2.6 model, neither a simulation with historic 
forcing nor ensemble studies are available). The trend pattern of the 
observed SSTs is not sensitive to the choice of the time interval used 
to calculate the linear trend (see Extended Data Fig. 1). Figure 1 shows 
that the control run is almost free of SST trends, and that the observed 
SST trend pattern resembles that measured in the CO2-doubling 
experiment. To account for the much larger global SST warming (by 
a factor of four) seen in the model experiment compared with obser- 
vations, in Fig. 2 we divide both patterns by the global mean SST trend 
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Marine Fisheries Service, Northeast Fisheries Science Center, Geophysical Fluid Dynamics Laboratory, Princeton University, Princeton, NJ, USA. *e-mail: caesar@pik-potsdam.de; 
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Fig. 1 | Comparison of SST trends in model and observations. Left and 
middle, linear SST trends obtained using the CM2.6 climate model of the 
Geophysical Fluid Dynamics Laboratory (GFDL) during a CO3-doubling 
experiment (left) and in a control run with fixed CO, concentrations 


to normalize the amplitude. A global view of these SST trends is shown 
in Extended Data Fig. 2. 

The comparison of the normalized modelled and observed SST 
trend patterns (Fig. 2) shows a remarkable resemblance, especially 
when focusing on the northern Atlantic—the area where SSTs are most 
affected by changes in the AMOC. Both patterns comprise an area 
of below-average warming (normalized trend < 1) and cooling (nor- 
malized trend <0) in the subpolar gyre region. This lack of warming 
or cooling is associated with a slowdown of the AMOC by around 4 
sverdrups (Sv; 1 Sv = 10° m? s~')—as predicted by the CM2.6 simu- 
lation (see Fig. 3)—and a corresponding reduction in heat transport 
into that region. This feature is accompanied by an above-average 
warming (normalized trend > 1) in the vicinity of the Gulf Stream, 
which is enhanced by up to a factor of four—five over the global mean 
warming (for a definition of the regions, see inset of Fig. 3). The median 
trend of the subpolar gyre region is located at the third percentile of all 
trends in the observational data, and at the first percentile in the model. 
The median trends in the Gulf Stream region are located at the 96th 
and 98th percentiles of all trends in the observational data and model, 
respectively (see Methods and Extended Data Fig. 3). We define the 
combination of these features as the AMOC fingerprint, as both signals 
can be physically linked to changes in the AMOC. 

Although the cold patch in the subpolar gyre region has previously 
been connected to a slowdown of the AMOC’ and is present in the 
CMIP5 simulations”, here we are able to link the extreme warming 
observed along the US northeast coast to the Gulf Stream shifting 
northwards and closer to shore as a consequence of an AMOC slow- 
down (see Extended Data Fig. 4a). An opposite (that is, southward) 
Gulf Stream shift has previously been found as a response to an AMOC 
strengthening in idealized model simulations in which the AMOC 
was deliberately enhanced by an imposed density anomaly in the deep 
overflow from the Nordic Seas; this overflow feeds the lower branch 
of the AMOC}S, the deep western boundary current (DWBC). The 
physical mechanism of the interaction of the DWBC with the Gulf 
Stream at their crossing point is a robust mechanism that is known 
from theory and from both conceptual and more complex models: it 
is a consequence of vorticity conservation on a rotating sphere’*. The 
downslope flow of the DWBC in the crossover region leads to vortex 
stretching, which must be balanced higher up in the water column, 
leading to the formation of a northern recirculation gyre that forces 
the Gulf Stream to separate from the US east coast. As the flow of the 
DWEBC is strengthened, the recirculation gyre becomes stronger and 
the separation point of the Gulf Stream moves southwards. Given that 
the Gulf Stream transports warm water, this signal is reflected in the 
SST. For a more detailed discussion of this mechanism, see Methods. 

The physical mechanism behind the warming also explains why 
it cannot be seen in climate models with a coarser ocean resolution, 
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(middle). Right, observed SST trends from 1870 to 2016 (HadISST data). 
We used data from the November-May season. Note the different scales 
related to the differing amounts of CO, forcing between model and 
observations. 


including versions that are similar to CM2.6, with the same atmosphere 
but a coarser ocean resolution. Only the high-resolution model accu- 
rately represents the formation of the northern recirculation gyre and 
thus the correct coastal separation position of the Gulf Stream, which 
is a necessary condition for modelling the shifts in the Gulf Stream that 
are due to changes in AMOC strength. The northward shift of the warm 
water of the Gulf Stream leads to extreme warming along the US coast 
and a cooling to the south of this warming (as can be seen by the blue 
area to the south of the Gulf Stream in the CM2.6 simulation; Fig. 2). 
Another indication of a northward shift of the Gulf Stream in the CM2.6 
model is enhanced warming of ocean-bottom temperatures on the con- 
tinental shelf, particularly in the Gulf of Maine, as a result of a poleward 
retreat of the Labrador Current following the northward shift®. This 
warm part of the AMOC fingerprint cannot be explained by aerosol 
shading. The cooling in the subpolar gyre region in the CM2.6 model 
cannot be caused by aerosols either, because the modelled response is 
entirely CO2-driven—that is, no aerosol forcing was prescribed. This 
strongly supports earlier arguments against the aerosol hypothesis’». 

We have looked for the fingerprint of an AMOC slowdown in seven 
available observational SST data products (Extended Data Fig. 5). All 
of these datasets show the cold patch in the subpolar Atlantic, and, to 
a greater or lesser extent, the enhanced warming inshore of the Gulf 
Stream. The weaker cooling signal just south of this warming cannot 
be seen in most of the observational datasets (except the COBE data; 
see Extended Data Fig. 5). This could be because of the lower spatial 
resolution of the observational data products and the smaller AMOC 
decline in the observations as compared with the model simulation. 
The data products are distinct partly because of the different input data- 
bases used, and because of different degrees of data homogenization, 
bias adjustment, averaging and interpolation, which preserve different 
amounts of spatial and temporal structure (see Extended Data Table 1). 
The main difference is that, for example, the ERSST data concentrate 
on the preservation of temporal structure, whereas the HadISST data 
focus on the preservation of spatial structure. As we are interested in 
the spatial pattern of longer-term trends, in Fig. 2 we show the SST 
data with the best combination of spatial resolution (1.0 x 1.0 degrees), 
spatial preservation and quality control, namely, the HadISST data’®. 

We note that the sea-ice-covered regions of the Arctic Ocean show 
no temperature trend, consistent with the assumption that SST remains 
close to freezing point there. In the observations, this blue area is 
crossed by a red line where the sea-ice margin has retreated (Fig. 2). 
The linkages of the AMOC in the open Atlantic to the northward flow 
of Atlantic waters past Iceland warrant further investigation, but are 
beyond the scope of this paper. 

Finally, both model and data show widespread above-average warm- 
ing in the South Atlantic, consistent with the temperature see-saw 
effect of an AMOC decline leading to reduced northward ocean heat 
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Fig. 2 | Comparison of normalized SST trends. Left, linear SST trends 
during a CO,-doubling experiment using the GFDL CM2.6 climate 
model. Right, observed trends during 1870-2016 (HadISST data). Both 
sets of data are normalized with the respective global mean SST trends, 
and in both cases we used data from the November-May season. Regions 


transport across the equator'”!®. The observations show particularly 
strong warming along the Benguela Current and its northward exten- 
sion towards the Gulf of Guinea. This is a common response in climate 
models to an AMOC weakening”), and is related to a reduced cold 
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Fig. 3 | Comparison of time series of SST anomalies and the strength 

of the overturning circulation in the CM2.6 model. The graph shows 
time series of SST anomalies (relative to global mean SSTs) in the subpolar 
gyre (sg; dark blue) and Gulf Stream (gs; red) regions in the CO2-doubling 
run relative to the control run, as predicted by the CM2.6 model. These 
two regions are defined as shown in the inset (see Methods). The anomaly 
of the actual AMOC overturning rate relative to the control run is also 
shown (light blue). Thin lines show individual years (November to May 
for SSTs), and thick lines show 20-year locally weighted scatterplot 
smoothing (LOWESS) filtered data. Using the CMIP5 ensemble, we 
independently determined a conversion factor of 3.8 Sv K~! between the 
SST anomaly and the AMOC anomaly. 


that show cooling or below-average warming are shown in blue; regions 
that show above-average warming are in red. Owing to the much greater 
climate change in the CO2-doubling experiment, the signal-to-noise ratio 
for the modelled SST trends is better than that for the observations. 


northward flow, but is not seen in the CM2.6 simulations. This omis- 
sion might be related to the model’s representation of the AMOC or 
of wind-driven circulation in the South Atlantic, and needs further 
investigation. 
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Fig. 4 | Seasonal variation in SSTs in the subpolar gyre region. We show 
here the seasonal cycle in the normalized SST trend in the subpolar gyre 
(sg) region for the CM2.6 model (light blue) and HadISST data (dark 
blue). A value of 1 represents annual-mean, global-mean warming. In 
addition, we show the seasonal cycle of the normalized global-mean SST 
trend for the model (light green) and observations (dark green). The 

SST trends in the subpolar gyre region are well below the global-mean 
warming year-round (differences are given in numbers along the x axis 
for the CM2.6 model (light grey) and the HadISST data (dark grey) and 
highlighted by arrows), yet are smallest during the cold part of the year for 
both observations and model. 


12 APRIL 2018 | VOL 556 | NATURE | 193 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 
"> CanESM2 


0.54 Mmm CCSM4 

Ml CESM1-BGC 

Ml CESM1-CAM5 

0.0 mmm CESM1-CAM5-1-FV2 

ml CNRM-CM5 

ME GFDL-ESM2M 
GISS-E2-R 

MS INMCM4 

ME MPI-ESM-LR 

ll MPI-ESM-MR 

lal MRI-CGCM3 

-1.5- 9 MRI-ESM1 

NorESM1-M 

NorESM1-ME 


-0.54- 


-1.0- 


AMOC anomaly (Sv per century) 


-2.0- 


2.54 


Slope = 3.8 + 0.5 Sv K" 
R=0.95 


T 
-0.8 -0.4 


T 
-0.2 0.0 0.2 


AMOC index change (K per century) 


Fig. 5 | Results of the CMIP5 ensemble regression analysis. The graph 
shows the linear trend in the simulated AMOC decline versus the SST- 
based AMOC index (November-May data) in ‘historic climate model 
runs from 1870 to 2016, using the CMIP5 climate model ensemble. (The 
runs were extended from 2006 to 2016 with simulations of the RCP8.5 


The subpolar cold patch as an AMOC indicator 

The surface temperature in the subpolar gyre region, relative to the 
large-scale temperature trend, has been proposed as an index for 
longer-term AMOC variations’. Here we test and develop this concept 
further. Figure 4 compares the seasonal cycle in the linear SST trend in 
the subpolar gyre region from the HadISST data since 1870 with the 
80-year CO2-doubling experiment. The figure shows that the cooling 
(relative to the global mean SST) in this region is most pronounced 
during winter and spring. This is to be expected if the relative cold in 
this area is due to an AMOC slowdown and therefore driven by the 
ocean. In summer, a shallow surface mixed layer develops that is more 
susceptible to surface forcing than to horizontal heat advection, so the 
cold patch can be effectively capped and hidden by a warm surface 
layer. It typically re-emerges in autumn. 

Given this result, in Fig. 2 we show the linear trends for November to 
May and below we propose an improved AMOC index based on these 
months, with a better signal-to-noise ratio than that obtained using 
annual data. The AMOC fingerprint pattern itself is not sensitive to 
the choice of the winter and spring seasons, as the linear trends of the 
annual data show (Extended Data Fig. 1). 


Performance of the AMOC index in models 

Given the hypothesis that a slowdown of the AMOC leads to a region 
of relative cooling near the subpolar gyre and a region of above-average 
warming in the vicinity of the Gulf Stream, we test whether in the 
models the temperatures in these regions can be used to reconstruct 
changes in the AMOC. 

Figure 3 shows time series of the mean temperatures of the subpolar 
gyre (sg, dark blue line) and the Gulf Stream (gs, red line) regions rel- 
ative to—that is, minus—the global mean SST. The averaging regions 
are defined as shown in the inset of Fig. 3 (see Methods). 

The two modelled SST time series are anti-correlated (R = —0.73), 
yet the pronounced temperature maximum in the Gulf Stream region 
around model year 50 (red line), which is unrelated to an AMOC 
change in the model (light blue line), suggests that variability due to 
factors other than the AMOC is substantially affecting the temper- 
ature of the warm patch. This is to be expected particularly for the 
coastal waters in the Gulf Stream region, which are more susceptible 
to wind-forced SST changes—for example, owing to the presence of 
strong horizontal gradients and coastal upwelling or downwelling. In 
accordance with this, the observed time series for the warm and cold 
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scenario.) Orthogonal regression analysis was performed with n= 12 
models (indicated by coloured symbols). The grey area marks the 20 
confidence interval. The three models labelled in grey were not included in 
the regression owing to unrealistic AMOC representation; see Methods. 


patches are only moderately anti-correlated (R = —0.36). This variabil- 
ity, unrelated to the AMOC, makes the warm patch unsuitable for use 
as an AMOC proxy owing to its poor signal-to-noise ratio, in contrast 
to the subpolar cold patch (see below). To maximize the signal-to-noise 
ratio, we base the AMOC index definition only on the subpolar gyre 
data (see Methods). 

To test the ability of this index of detecting past AMOC changes, we 
turn to the CMIP5 coupled climate model ensemble”®, using all simu- 
lations for which an AMOC diagnostic is available (n = 15; Extended 
Data Table 1). The region defining the subpolar cold patch is chosen 
to be large enough to encompass the cooling found across all models, 
because its exact location differs in each model. Figure 5 shows the 
linear 1870-2016 trend in the AMOC index, as well as in the actual 
AMOC, in these models. The correlation for the models with a realis- 
tic AMOC has R=0.95, so the AMOC variation explains 89% of the 
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variance in the AMOC index. This confirms that the AMOC (at least 
on this long timescale) is indeed the dominant factor controlling the 
SST anomaly in the subpolar Atlantic. Hence the AMOC index can be 
used with confidence to identify the AMOC decline since 1870. The 
total-least-squares line shown in Fig. 5 has a slope of 3.8 SvK~' and an 
intercept of 0.1 Sv for the chosen subpolar gyre region (for more infor- 
mation on the regression, see Methods). The very small intercept value 
suggests that factors other than the AMOC have a minor influence 
on SST changes in the subpolar Atlantic. For example, a local aerosol 
cooling effect, relative to the global mean SST change, would cause a 
systematic offset in this regression. Given that this offset is negligible, 
however, the slope value of 3.8SvK~! can be used to calibrate between 
the AMOC index and the AMOC strength. 


AMOC time evolution 

In Fig. 6 we show the time evolution of the AMOC, reconstructed from 
observational SST data (blue curve)from the period 1870-2016 using 
the calibration factor 3.8 Sv K7! found from the CMIP5 models (for a 
comparison with the earlier AMOC index’, see Extended Data Fig. 6). 
This time evolution suggests that the AMOC reached a minimum 
around 1990, recovered to a peak value in the early 2000s, and then 
declined again. As shown, this time evolution is consistent with the 
linear decline measured by the RAPID project (at 26° N)! since 2004, 
with that reconstructed by the GloSea5 ocean reanalysis” since 1995, 
and with a reconstruction from satellite altimetry and cable measure- 
ments”. It is also consistent with the finding” of a reduction in AMOC 
strength of approximately 2.6 Sv from the end of the 1950s until today, 
and with the observation”® of an AMOC strengthening from the 1980s 
until the mid-2000s. An analysis of recent (2004-2016) subsurface tem- 
perature data”® found cold subsurface anomalies around the latitude 
of the Gulf Stream (38° N) that could be associated with a shift in the 
meridional position of the Gulf Stream towards the north, supporting 
our argument for such a shift in response to an AMOC decline. 

The observed index decline of —0.44 K per century translates into 
an AMOC trend of —1.7 Sv per century, or a 2.3-Sv linear weakening 
over the 136-year period. As Fig. 5 shows, this AMOC decline is within 
the range of AMOC decline predicted by the CMIP5 climate models 
in response to historic (mostly anthropogenic) forcing. Considering 
the 20-year smoothed curve rather than the linear trend, the AMOC 
weakening until today has been around 3 Sv, and has mainly occurred 
since the 1950s (Fig. 6). 

Comparing the SST anomalies in the CM2.6 model (Fig. 3) and 
observations (Fig. 6), one can see that generally they show similar 
magnitudes of interannual and interdecadal variability. To estimate 
the different types of variability, we apply a 20-year LOWESS filter?” 
to the data, which should largely remove any short-term variability 
in the SST that is unrelated to the AMOC. We estimate the interan- 
nual variability from the standard deviation of the annual time series 
minus the 20-year LOWESS-smoothed data. We find the variability 
in the cold patch to be 0.20 K and 0.19 K from the high-resolution 
model and observations, respectively. The interannual variability 
in the warm patch is 0.30 K for both model and observations. We 
estimate the interdecadal variability from the standard deviation 
of the 20-year LOWESS-smoothed data minus the linear trend of 
the smoothed data. The variability is 0.14 K (model) and 0.15K 
(observations) for the cold patch, and 0.21 K (model) and 0.18K 
(observations) for the warm patch. A discussion of how our results 
relate to the dominant modes of atmospheric variability in the North 
Atlantic can be found in Methods. 


Conclusions and impacts 

We have identified a characteristic SST fingerprint of an AMOC slow- 
down on the basis of high-resolution model simulations. The finger- 
print consists of a cooling in the subpolar gyre region due to reduced 
heat transport, and a warming in the Gulf Stream region due to a north- 
ward shift of the Gulf Stream. This fingerprint is most pronounced 
during winter and spring, and it is found in the observed long-term 
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temperature trends, indicating a pronounced weakening of the AMOC 
since the mid-twentieth century. 

We have also defined an improved SST-based AMOC index, which 
is optimized in its regional and seasonal coverage to reconstruct 
AMOC changes. Analysis of an ensemble of CMIP5 model simula- 
tions confirms that this index can very well reconstruct the long-term 
trend of the AMOC. We calibrated the observed AMOC decline to be 
3 +1Sv (around 15%) since the mid-twentieth century, and recon- 
structed the evolution of the AMOC for the period 1870-2016. For 
recent decades, our reconstruction of the AMOC evolution agrees 
with the results of several earlier studies using different methods, 
suggesting that our AMOC index can also reproduce interdecadal 
variations. 

Our findings show that in recent years the AMOC appears to have 
reached a new record low, consistent with the record-low annual SST 
in the subpolar Atlantic (since observations began in 1880) reported 
by the National Oceanic and Atmospheric Administration for 2015. 
Surface temperature proxy data for the subpolar Atlantic suggest that 
“the AMOC weakness after 1975 is an unprecedented event in the past 
millennium”. This is consistent with the coral nitrogen-15 data that 
led Sherwood et al.?8 to conclude that “the persistence of the warm, 
nutrient-rich regime since the early 1970s is largely unique in the con- 
text of the last approximately 1,800 yr”. Although long-term natural 
variations cannot be ruled out entirely??°, the AMOC decline since 
the 1950s is very likely to be largely anthropogenic, given that it is a 
feature predicted by climate models in response to rising CO} levels. 
This declining trend is superimposed by shorter-term (interdecadal) 
natural variability. 

The AMOC weakening may already have an impact on weather in 
Europe. Cold weather in the subpolar Atlantic correlates with high 
summer temperatures over Europe, and the 2015 European heat wave 
has been linked to the record ‘cold blob’ in the Atlantic that year*’. 
Essentially, low subpolar SSTs were found to favour an air-pressure 
distribution that channels warm air northwards into Europe. Model 
simulations further suggest that an AMOC weakening could become 
the “main cause of future west European summer atmospheric circu- 
lation changes”*”, as well as potentially leading to increased storminess 
in Europe*’. AMOC weakening has also been connected to above- 
average sea-level rise at the US east coast*+?° and increasing drought 
in the Sahel’. 

Continued global warming is likely to further weaken the AMOC 
in the long term, via changes to the hydrological cycle, sea-ice loss 
and accelerated melting of the Greenland Ice Sheet, causing further 
freshening of the northern Atlantic*®?”. Given that the AMOC is 
one of the well documented ‘tipping elements’ of the climate system, 
with a defined threshold for collapse’, it is of considerable concern 
that the proximity of the Atlantic to this threshold is still poorly 
known3841, 
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METHODS 


Climate model simulations. The CM2.6 coupled global climate model was 
developed by the Geophysical Fluid Dynamics Laboratory of the National 
Oceanic and Atmospheric Administration. It includes an atmospheric general 
circulation model at an average horizontal resolution of 0.5 x 0.5 degrees (50 km) 
and an ocean circulation model at 0.1 x 0.1 degrees (10km)*!!?, The ocean 
has 50 vertical levels and includes a sea-ice model. Two simulations were per- 
formed that were both initialized from present-day ocean conditions, followed by 
a spin-up time of 100 years at constant 1860 CO; levels. The control simulation, 
of 80 years’ duration, then maintained CO, concentrations at the 1860 level; 
in the experimental run, by contrast, atmospheric CO; increased by 1% per 
year over 70 years until it doubled, and then remained at this level for another 
10 years. Given the extremely high computational cost of this model (approximately 
one day per one year of simulation on a high-performance computer), no further 
simulations are available. 

Definition of the AMOC index. We define the AMOC index I,yoc as the 
difference between the mean SST of the geographic region that is most sensitive 
to a reduction in the AMOC (the subpolar gyre region, sg) and that of the whole 
globe: 


Ixmoc = SST. = SST ytobal 


Rather than including the whole year, we instead use only the winter and spring 
months (November to May), because the AMOC signal found in the SST is most 
pronounced during these seasons (see Fig. 4). Thus the AMOC index for a certain 
year is defined as the mean SST in the subpolar gyre region for the following 
November—May season, minus the global mean SST for that season. 

Definition of the subpolar gyre region. To define the region used to calculate 
the AMOC index (shown in the inset of Fig. 3), we assumed that SST differences 
in the subpolar North Atlantic relative to the global mean SST are dominated by 
variations in the AMOC. For this study, we determined this region by combin- 
ing normalized linear SST trends from both the HadISST dataset and the high- 
resolution CM2.6 model run, as shown in Fig. 2. Grid cells that show relative 
cooling in either the observations or the model were included in the definition. 
The region is large (compared, for example, with that used in ref. ?), which has 
the advantage that it should cover most of the area in which the heat transported 
northwards by the AMOC is vented to the atmosphere in the observations and in 
the models, especially considering that the exact location of heat release is, to some 
degree, model-dependent. The exact coordinates of the region are available in a 
public data repository (see Data availability). 

Definition of the Gulf Stream region. Similar to the subpolar gyre region, the 
Gulf Stream region is defined as the region that covers the above-average long- 
term warming east of the US coast that results from an AMOC slowdown in both 
observations and model (see inset of Fig. 3). Thus, the terms Gulf Stream region 
and subpolar gyre region do not refer directly to ocean circulation features, but 
rather to SST features. The exact coordinates of the region are available in a public 
data repository (see Data availability). 

AMOC effects on Gulf Stream separation point and DWBC strength. We link 
the extreme warming observed along the US coast to the Gulf Stream shifting 
northwards and closer to shore as a consequence of an AMOC slowdown. For 
the MOM4 ocean model, it has been shown that the correct separation point of 
the Gulf Stream is achieved through a reasonable representation of the DWBC™. 
Furthermore it has been shown that, for this model, a weakening of the AMOC 
is accompanied by a weakening of the DWBC and that both are followed by a 
northward shift of the mean Gulf Stream path**, The combination of these results 
indicates that, in the model run, the observed warming is indeed due to a weakened 
AMOC that leads to a weakened DWBC, a weakened northern recirculation gyre 
and a northern shift of the Gulf Stream separation point. To test this, we compared 
the evolution of the Gulf Stream path (represented by the Gulf Stream index—that 
is, the mean latitude of the 15°C isotherm at a 200-m depth in the Northwest 
Atlantic, between 75° W and 55° W“*) with the AMOC strength at 26° N in the 
CM2.6 control run and the CO2-doubling run (Extended Data Fig. 4a). 

We compared the AMOC strength to the summed southward deep-ocean trans- 
port (between depths of 1,000 m and 4,000 m) at 40° N in the region between the 
coast and 65° W, for the CM2.6 control run and the CO>-doubling run (Extended 
Data Fig. 4b). We found that the DWBC in the model indeed weakens as the 
AMOC slows down, and by a very similar amount (around 3.5 Sv). We calculated 
the DWBC at this latitude because it is just north of the region where the Gulf 
Steam and DBWC cross in the control run, and is thus the area where the north- 
ern recirculation gyre forms, which forces the Gulf Stream to deflect from the 
coast. These analyses confirm that the AMOC weakening in the model is indeed 
accompanied by a weakened DWBC and a northerly shift of the Gulf Stream path. 
Analysis of additional observational datasets. For this study, we analysed seven 
available SST data products. All of them show the fingerprint of the AMOC, 
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namely, the cold patch in the subpolar Atlantic and, to a greater or lesser extent, 
the enhanced warming inshore of the Gulf Stream (Extended Data Fig. 5). Details 
of the different datasets are given in Extended Data Table 1. Different choices of 
processing steps lead to distinctions in the representation of spatial and temporal 
variability in the datasets. We focused on the dataset with the best spatial resolution 
and advanced quality control, the HadISST data. Although the ERSST data are 
also quality-controlled, the use of empirical orthogonal teleconnections for post- 
processing leads to a smoothing of the SST signal in the spatial domain (unwanted 
for this study). The bias adjustments and quality-control procedures used for the 
likewise high-resolution COBE dataset are not as advanced as those used for the 
HadISST and ERSST data. The SODA data are an ocean reanalysis product, that 
is, they are based on model simulations with data assimilation. 

Significance of the 1870-2016 trends. To illustrate the significance of the 1870- 
2016 linear trends, we compare the distribution of the long-term trends for all grid 
cells between 60° S and 75° N with the distribution of trends for the grid cells in 
the subpolar gyre region and with the grid cells in the Gulf Stream region (defined 
in the inset of Fig. 3). (We exclude the sea-ice-covered regions because they are 
expected to show no temperature trend, consistent with the assumption that SSTs 
remain close to freezing point there.) Extended Data Fig. 3 shows the global dis- 
tributions of relative SST trends for the HadISST data and the CO3-doubling run 
of the CM2.6 model. Assuming a constant bin size of 0.2, we determined the 5% 
and 95% quantiles. The medians of the subpolar gyre and Gulf Stream regions 
lay in all cases within the lowest and highest 5% of the trends. The median of the 
Gulf Stream region in the HadISST data is 2.4 (that is, the warming here is 2.4 
times larger than the global SST warming), higher than 96% of the SST trends; 
the median of the subpolar gyre region is —0.17, and thus among the lowest 3% of 
the trends. In the CO2-doubling run of the CM2.6 model, the AMOC fingerprint 
regions are even greater outliers, presumably because the larger global-warming 
signal and associated greater AMOC weakening result in a better signal-to-noise 
ratio. The median of the Gulf Stream region in the models is 2.4, higher than 98% 
of the SST trends, and the median of the subpolar gyre region is —0.25, among the 
lowest 1% of the trends. 

Relation between the AMOC index and the overturning strength. To assess 
and calibrate the relation between changes in the AMOC index and the AMOC 
strength, we examined the AMOC index and AMOC simulations performed using 
15 models in the context of CMIP5 for the historical (1870-2005) climate, extended 
to 2016 using simulations of the RCP8.5 scenario. To assess whether the models 
have a reasonable representation of the AMOC, we compared the mean maximum 
AMOC at 26° N for the model years 2005-2014 with the mean of the observed 
AMOC at around 26° N during that period (16.8 Sv; see Extended Data Table 1). 
We chose models with mean maximum AMOCs of 16.8 + 10.0 Sv; this excluded 
the NorESM1-M and Nor-ESM1-ME models. We further excluded the GISS-E2-R 
model because it is an outlier with a very unrealistic deep mixed layer that reaches 
down to the sea floor in most of the subpolar Atlantic’. 

Total-least-squares fit. To test the relation between our AMOC index and the 
AMOC strength, we performed a total-least-squares fit (also known as an orthog- 
onal regression, because the error in both variables is minimized—that is, the error 
is orthogonal to the regression line). The full regression equation is: 


Y=3.8SvK~' x X+0.1 Sv per century 


where X is the trend in AMOC indices, in kelvins per century, and Y is the corre- 
sponding trend in AMOC strength, in sverdrups per century. 

Sensitivity to extension of the subpolar gyre region. The region chosen as the 
subpolar gyre region is, on average, largely free of sea ice (to analyse this, we com- 
pared the region with the average November-May sea-ice cover from the HadISST 
data). To explore how partly ice-covered areas influence the index, we limited 
the region to ice-free areas (determined by the maximum sea-ice cover for the 
November-May season from 1870 to 2016), and compared the resulting index 
with our original AMOC index (Extended Data Fig. 7). This shows some differ- 
ences in the year-to-year variations, but the longer-term trend, especially in the 
last decades, is hardly affected at all. Thus we conclude that sea ice does not affect 
our AMOC index. 

Comparison with a previous AMOC index. Rahmstorf et al.’ used a different 
region and different data (annual HadCRUT4 SSTs minus their annual Northern 
Hemispheric mean, both land and ocean) to obtain the AMOC index. We calculate 
the AMOC index relative to the global mean SST; however, as our comparison of 
the two indices shows, the index is not sensitive to this choice (Extended Data 
Fig. 6). Rahmstorf et al.” also determined the conversion factor between their 
AMOC index and the actual AMOC by using only one model, MPI-ES-MR. We 
updated their AMOC index with the latest data and compared it with the AMOC 
slowdown determined herein (Extended Data Fig. 6). The results that we obtained 
with both index definitions are highly consistent on the multidecadal timescale 
of interest. 
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Link to empirical modes of variability. Two main modes of variability have been 
defined in the North Atlantic, primarily on the basis of empirical data: the North 
Atlantic Oscillation (NAO) and the Atlantic Multidecadal Oscillation (AMO). 
The former describes atmospheric variability, with an index based on the sur- 
face pressure gradient*®, whereas the latter describes SST variability relative to 
the global mean—similar to our AMOC index, but including Atlantic SSTs down 
to the Equator. Both NAO and AMO indices show a correlation with our AMOC 
index (Extended Data Figs. 8, 9). 

For the AMO index this is not surprising, given that it has the subpolar SST 
data in common with our AMOC index. However, the usefulness of the AMO 
index is limited by the fact that it conflates subpolar SST variability and tropical 
SST variability into one index‘. For our purpose of using SSTs to deduce AMOC 
variations, this degrades the signal-to-noise ratio. Furthermore, it can be seen 
that the decadal variations in our AMOC index are similar to those of the AMO 
index (Extended Data Fig. 8b), which is in accordance with other studies showing 
that the time evolution of the AMO can at least partly be explained by changes 
in Atlantic Ocean currents**“”. Yet because the AMO conflates two regions with 
different long-term trends—that is, the subpolar North Atlantic, which is 
cooling, and the tropics and subtropics, which are warmer with tempera- 
ture trends at or above the rate of the global mean (Fig. 2)—it does not show the 
1870-2016 negative trend that is clearly visible in our AMOC index (Extended 
Data Fig. 8a). 

The NAO index is more useful, as it can be used to study the relationship 
between atmospheric-pressure variability and North Atlantic SSTs. We find a 
clear negative correlation with R= —0.54 between the decadally smoothed time 
series of the AMOC and the NAO indices, which occurs when the AMOC leads 
the NAO by three years (see Extended Data Fig. 9b). This negative correlation, and 
the fact that a pronounced cooling in the subpolar North Atlantic has been shown 
to be followed by a positive phase of the NAO™, suggests that on interdecadal 
timescales the AMOC at least partially drives NAO changes via changes in North 
Atlantic SSTs, rather than the other way round. Consistent with this, the NAO 
index shows a positive trend for 1870-2016 (Extended Data Fig. 9a). A positive 
NAO, on the other hand, helps to extract heat from the subpolar ocean through 
enhanced westerly winds over that region, cooling SSTs, enhancing convection 
and increasing ocean density*!. This acts as a negative feedback on an AMOC 
weakening. Such a delayed negative feedback could either dampen the AMOC 
response or lead to oscillatory behaviour. Further investigation of this linkage is 
beyond the scope of this study. Nevertheless, our work supports the importance 
of ocean circulation to variations in the North Atlantic SST pattern, which has 
been highlighted previously*”*?. 

Code availability. Code for running the CM2.6 experiment is available from http:// 
www.gfdl.noaa.gov/. Scripts for analysing the data are available from the corre- 
sponding authors upon reasonable request. 

Data availability. The SST datasets analysed here are publicly available; detailed 
information is given in Extended Data Table 1. The CMIP5 model output is 
available from https://esgf-node.Ilnl.gov/projects/cmip5/. The CM2.6 model 
output is available from V.S. (vincent.saba@noaa.gov) upon reasonable request. 
The exact definitions of the subpolar gyre and Gulf Stream region, as well as the 
SST anomalies of these regions, are available in a public data repository: http:// 
www.pik-potsdam.de/~caesar/AMOC_slowdown/. The data for the GloSea rea- 
nalysis were provided by L. Jackson”. The data for the reconstruction from sat- 
ellite altimetry and cable measurements were provided by E. Frajka-Williams”’. 


RAPID data are available from http://www.rapid.ac.uk/rapidmoc/rapid_data/ 
datad].php. 
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Extended Data Fig. 1 | Normalized SST trends in the HadISST data timespan. The pattern is normalized with the respective global mean SST 
for different time periods. Observed linear SST trends (using annual trend. Regions that show below-average warming or cooling are in blue; 
HadISST data), calculated for different timespans to test the robustness regions that show above-average warming are in red. 


of the linear SST trend pattern to the starting and ending years of the 
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-3 “2 | 0 1 2 3 4 5 
Local SST trend/global SST trend 
Extended Data Fig. 2 | Comparison of global normalized SST trends. show cooling or below-average warming are in blue; regions that show 
Linear SST trends during a CO-doubling experiment using the GFDL above-average warming are in red. Note again that owing to the much 
CM2.6 climate model (top), and observed trends during 1870-2016 greater climate change in the CO -doubling experiment, the signal- 
(HadISST data, bottom), both normalized with the respective global mean to-noise ratio for the modelled SST trends is better than that for the 
SST trends and using data from the November-May season. Regions that observations, and thus the noise level is suppressed by the normalization. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


a HadISST data 


Cc 
S 
Do) 
Oo 
— 
D 
wn 
Cc 
& 
no} 
oO 
= 


Probability 


Median gs region 


-2 -1 0 1 2 3 4 
Local SST trend/global SST trend 
b CM2.6 data 


Probability 
Median sg region 


-2 -1 0 


Median gs region 


1 2 3 4 


Local SST trend/global SST trend 


Extended Data Fig. 3 | Histograms showing the distribution of the 
normalized longer-term trends. a, The distribution (grey bars) of all local 
trends, normalized to the global trends, from the HadISST data for 1870- 
2016, for latitudes between 60° S and 75° N. The distribution is located 
around y= 1 with a standard deviation of o = 0.66 (grey bars). The 5th and 
95th percentiles are marked in darker grey. The distribution of the 1870- 
2016 trends for grid cells assigned to the subpolar gyre regions is shifted to 
lower or even negative values, with a median of X,, = —0.17 (blue). The 
distribution of trends for grid cells in the Gulf Stream region are shifted to 


higher values, with a median of Xgs = 2.4 (red). The distributions are 


normalized to account for the different sample sizes of global, subpolar 
gyre and Gulf Stream regions. b, As for panel a, but for the COj-doubling 
run of the CM2.6 model, with u= 1.1, c= 0.48, Xp 0.02 and Xps a2 4, 
The standard deviations of the model data are expected to be smaller than 
those of the observations because of the larger climate-change signal by 
which the model data are normalized; this reduces the ‘noise’ of short-term 


variability relative to the climate signal. 
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Extended Data Fig. 4 | Influence of the AMOC on the separation point point. b, Time series of the southward transport of the deep ocean current 
of the Gulf Stream. a, The evolution of the Gulf Stream (GS) separation (summed between depths of 1,000 m and 4,000 m) at 40° N in the region 


point compared with the AMOC strength in the CM2.6 control and CO- between the US coast and 65° W (see Methods), showing a weakening 
doubling runs, as indicated by the Gulf Stream index“. The graph shows DWBC during the CO2-doubling experiment. The thin lines show annual 
a link between a weaker AMOC and a northward shift of the separation values, the thick lines show the 20-year LOWESS-smoothed values. 
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Extended Data Fig. 5 | Linear SST trends from a CO>-doubling 
experiment using the GFDL CM2.6 climate model, and observed 
long-term trends from different SST data products, normalized with 
the respective global mean SST trends. The trend from 1870 to 2016 

was calculated using those datasets that provide data until the present 
(HadISST!°, ERSSTv5*4, ERSSTv4°°, ERSSTv3b® and Kaplan*”). Otherwise, 
it was calculated from 1870 to the end of the available time period (SODA*® 
and COBE™; see Extended Data Table 1). The SODA data are given for a 
depth of 5 m instead of the surface; thus, the long-term trend differs for 
regions with ice cover. For the SODA data, the normalization was adjusted 
with surface SST data instead of the data at a 5-m depth, to make this 
dataset comparable to the others. All datasets show a prominent cooling 

in the subpolar gyre region; the high-resolution data (HadISST, COBE and 
SODA) also show pronounced warming in the Gulf Stream region. 
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Extended Data Fig. 6 | Time series of the AMOC anomaly for two data), updated with the latest data to 2016. In blue is the AMOC anomaly 
definitions of the AMOC index. We calculated the AMOC anomaly as defined herein (HadISST data). Thick lines are smoothed by a 10-year 
from two AMOC indices and two model-based conversion factors. In LOWESS filter. This smoothing filter is lower than that used in Fig. 6, in 
red is the AMOC anomaly as defined by Rahmstorf et al.” (HadCRUT4 order to compare and show the two indices with a higher time resolution. 
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Extended Data Fig. 7 | Sensitivity to the extension of the subpolar gyre the maximum sea-ice cover for the November-May season from 1870 to 
region regarding sea-ice cover. a, Left panel, our original subpolar gyre 2016. b, Comparison of the AMOC indices based on these two regions. 
region (blue outline) and the average November-May sea-ice cover from The thin lines show annual values, the thick lines show the 20-year 
1870 to 2016 (blue shading, from HadISST data). Right panel, a reduced LOWESS-smoothed values. 


subpolar gyre region (green outline) that is always ice-free, compared with 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


1.00 
AMOC Index 
— AMOC Index smoothed 
0.75 AMO Index 
— AMO Index smoothed 
0.50 
0.25 


SST anomaly (K) 
o 
(o) 
co] 

(y) xepul ONY 


-0.25 


-0.50 


-0.75 


-1.00 
1880 1900 1920 1940 1960 1980 2000 2020 


b Year 


Detrended AMOC Index 
— Interdecadal AMOC variability 
0.75 AMO Index 
— AMO Index smoothed 


0.50 


0.25 


SST anomaly (K) 
(y) xepul| ONY 


-0.50 


-0.75 


-1.00 


1880 1900 1920 1940 1960 1980 2000 2020 
Year 


Extended Data Fig. 8 | Comparison of interdecadal variability of the smoothed values. We show our AMOC index for comparison. b, As 
AMOC index and the AMO index. a, We calculated the AMO index from for panel a, but here the AMO index is compared with the interdecadal 
the HadISST dataset after Trenberth and Shea. This index is defined as variability of our AMOC index—that is, the detrended 20-year LOWESS- 
the weighted mean SST over the North Atlantic (0° N to 80° N), relative to smoothed index. The comparison shows that the AMO index has similar 
the mean SST from the period 1901-1970, but with the global mean SST interdecadal variability to the AMOC index but is lacking the climatic 
(averaged over the global oceans from 60° S to 60° N) removed. The thin trend found in the latter. 

lines show annual values, the thick lines indicate the 20-year LOWESS- 
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Extended Data Fig. 9 | Comparison of AMOC and NAO. a, Comparison the thick lines show the 20-year LOWESS-smoothed values. The linear 
of our AMOC index with the interdecadal variability in the NAO index trend over the whole time period is shown with dashed lines. b, Lagged 
(after Hurrell°!), calculated as the sea-level pressure at the Lisbon station cross-correlation between the AMOC index and the NAO index shows 
minus the sea-level pressure at the Stykkisholmur/Reykjavik station for the that peak negative correlation occurs when the AMOC leads the NAO by 
months December to March (DJFM). The thin lines show annual values, three years, with R= —0.54. The red lines mark the 95% significance level. 
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Extended Data Table 1 | Detailed data and model information 


Data set HadISST1 ERSSTv4 ERSSTv3b SODA’ COBE Kaplan ERSSTv5 
Full Name Merged Hadley- NOAA Extended NOAA Simple Ocean Data Centennial Kaplan Extended NOAA Extended 
NOAA/Oi SST Reconstruction Extended Assimilation (2.2.4) in situ SST v2 Reconstruction SST 
SST version 4 Reconstructio Observation version 5 
n SST version -Based 
3b Estimates 
Resolution 1.0°latx1.0°lon, 2.0°latx2.0°lon, 2.0°latx2.0°lon _0.5°latx0.5°lon, 40 1.0°latx1.0°1 — 5.0°latx5.0°lon, 2.0°latx2.0°lon, monthly 
monthly monthly , monthly vertical levels, on, monthly monthly 
monthly 
Period of 01/1870 — 01/1854 — 01/1854 — 01/1870 — 12/2010 01/1850 — 01/1856 —present 01/1854 — present 
Record present present present 12/2015 
Input Data SST data are ICOADS ICOADS Assimilation of a ICOADS MOHSST5 ICOADS Release 3.0 
taken from the Release 2.5 and Release 2.4 model forecast and Release version of the and NCEP GTS data 
Met Office NCEP GTS and NCEP observation data 2.5. GOSTA data set 
Marine Data data. GTS data. (including surface from the UK MET 
Bank (MDB), temperature and office. 
ICOADS and salinity observations 
Global of various types, and 
Telecommunicat nighttime infrared 
ion System satellite SST data). 
(GTS) data. 
Processin Two stage Updated quality Quality control Assimilation product Gridding via Adjustments and Improved SST spatial 
g steps reduced-space control and bias and bias with quality control. optimal Filling via EOF and temporal variability 
optimal correction with correction with interpolation —_ projection, by reducing spatial 
interpolation Marine Marine (Ol) as well Optimal filtering in training the 
(RSOl) Nighttime Air Nighttime Air as bias- Interpolation (Ol), reconstruction EOTs, 
procedure, temperatures. temperatures. adjustment Kalman Filter removing high-latitude 
followed by Updated Fitting Fitting with and quality (KF) forecast, KF damping in EOTs, and 
superposition of with Empirical Empirical control. analysis, and an adding 10 more EOTs 
quality-improved § Orthogonal Orthogonal Optimal Smoother _ in the Arctic. Switch 
gridded Teleconnections = Teleconnectio (OS). from using Nighttime 
observations (EOTs). ns (EOTs). Marine Air Temperature 
onto the (NMAT) as a reference 
reconstructions. to buoy-SST as a 
reference in correcting 
ship SST biases. 
Source http:/www.metof — https://www.ncd https:/www.n http://dsrs.atmos.um https:/www. _ https://www.esri.n __https://www1.ncdc.noaa 
fice.gov.uk/hado c.noaa.gov/data- cdc.noaa.gov/ d.edu/DATA/soda_2. esrl.noaa.g oaa.gov/psd/data/ —_.gov/pub/data/cmb/erss 
bs/hadisst/data/ access/marineoc _ data- 2.4/ ov/psd/data/ —gridded/data.kapl t/v5/netcdf/ 
download.html ean- access/marine gridded/dat an_sst.html 
data/extended- ocean- a.cobe2.ht 
reconstructed- data/extended ml 
sea-surface- - 
temperature- reconstructed- 
ersst-v4 sea-surface- 
temperature- 
ersst-v3b 
Download 10/21/2016 11/19/2016 01/23/2017 01/23/2017 (updated 02/01/2017 02/01/2017 08/14/2017 
date (update on (update 02/21/2017) 
04/04/2017) 06/07/2017) 
Model Institute Average resolution of Mean AMOC 26 °N AMOC index decline AMOC decline 
oceanic model [Sv] [K/century] [Sv/century] 
CanESM2 CCCMA (Canada) 256 x 192 14.76 -0.16 -0.1 
ccsM4 NCAR (USA) Nominal 1° (1.125? in 17.72 -0.57 -2.2 
longitude, 
0.27-0.64° variable in latitude) 
CESM1-BGC NCAR (USA) Nominal 1° (1.125? in 18.63 -0.71 -2.7 
longitude, 
0.27—-0.64° variable in latitude) 
CESM1-CAM5- NCAR (USA) same as CESM1-BGC 19.02 -0.19 -0.5 
1-FV2 
CESM1-CAM5 NCAR (USA) same as CESM1-BGC 18.68 -0.21 -0.5 
CNRM-CM5 CNRM-CERFACS 0.7° on average ORCA1 12.33 -0.54 -2.1 
(France) 
GFDL-ESM2M NOAA GFDL (USA) 1° tripolar 360 x 200L50 18.09 -0.78 -2.6 
GISS-E2-R NASA GISS (USA) 0.2 to 1° latitude x 1° 18.22 0.02 -2.1 
longitude 
INMCM4 INM (Russia) 1 x 0.5° in longitude and 17.73 -0.40 -2.5 
latitude generalized spherical 
coordinates with poles 
displaced 
MPI-ESM-LR MPI-M (Germany) average 1.5° GR15 18.42 -0.54 -2.0 
MPI-ESM-MR MPI-M (Germany) approx. 0.4° TP04 16.67 -0.06 -0.6 
MRI-CGCM3 MRI (Japan) 1.0° x 0.5° 14.42 -0.08 -0.1 
MRI-ESM1 MRI (Japan) 1.0° x 0.5° 14.90 -0.10 -0.4 
NorESM1-ME NCC (Norway) 1.125° along the equator 29.56 -0.45 -0.1 
Nor-ESM1-M NCC (Norway) 1.125° along the equator 28.69 -0.27 -0.7 


Overview of the spatial and temporal resolution, period of record, input data, processing steps and sources of the 7 datasets that we used to study the AMOC slowdown, as well as details of the 


15 CMIP5 models used (for more detail, see Table 9.A.1. of ref. ©). 
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RNA viruses 


Mang Shi!?.38, Xian-Dan Lin*®, Xiao Chen®®, Jun-Hua Tian®®, Liang-Jun Chen!, 
Jin-Jin Shen’, Li Liu’, Edward C. Holmes? & Yong-Zhen Zhang!?* 


Kun Li!, Wen Wang!, John-Sebastian Eden’, 


Our understanding of the diversity and evolution of vertebrate RNA viruses is largely limited to those found in mammalian 
and avian hosts and associated with overt disease. Here, using a large-scale meta-transcriptomic approach, we discover 
214 vertebrate-associated viruses in reptiles, amphibians, lungfish, ray-finned fish, cartilaginous fish and jawless 


fish. The newly discovered viruses appear in every family or genus of RNA 


virus associated with vertebrate infection, 


including those containing human pathogens such as influenza virus, the Arenaviridae and Filoviridae families, and have 
branching orders that broadly reflected the phylogenetic history of their hosts. We establish a long evolutionary history 
for most groups of vertebrate RNA virus, and support this by evaluating evolutionary timescales using dated orthologous 
endogenous virus elements. We also identify new vertebrate-specific RNA viruses and genome architectures, and 
re-evaluate the evolution of vector-borne RNA viruses. In summary, this study reveals diverse virus-host associations 


across the entire evolutionary history of the vertebrates. 


RNA viruses infect a wide range of hosts and contain enormous genetic _ surveillance of invertebrate and vertebrate hosts, there are few direct 
and phenotypic diversity!. Because of their potential effect on pub- links between invertebrate and vertebrate viruses, and vertebrate 
lic health and the agricultural industries, considerable attention has __ viruses tend to form monophyletic groups that are only distantly related 
been directed towards describing the diversity and evolution of RNA __ to viruses found in invertebrates‘. Within vertebrates, there has been 
viruses associated with vertebrates. Despite an increasingly widespread a marked sampling bias towards mammals and birds’, even though 
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they represent only a small proportion of total vertebrate diversity. 
Far less is known about those viruses infecting fish, amphibians and 
reptiles*, despite their abundance, phenotypic diversity and central 
role in vertebrate evolution. Notably, the relatively few viruses from 
fish, amphibians and reptiles documented so far tend to form diver- 
gent lineages with respect to known vertebrate RNA viruses*’, which 
in part probably reflects the position of these hosts in the vertebrate 
phylogeny’. However, the extent of viral phylogenetic and genomic 
diversity in these taxa, their ancestry as well as the relative frequen- 
cies of virus—host co-divergence versus cross-species transmission in 
the evolution of vertebrate RNA viruses remains uncertain’. To better 
understand the origin and evolutionary history of vertebrate viruses, we 
screened for RNA viruses in a diverse set of species that covered much 
of the phylogenetic diversity of the vertebrates, including those basal 
vertebrate lineages in which viruses have only rarely been documented. 


Expanding diversity of vertebrate viruses 

We performed a large-scale meta-transcriptomics survey of poten- 
tial vertebrate-associated RNA viruses in more than 186 host species 
representing the extensive diversity within the phylum Chordata 
(Fig. 1a, b, Supplementary Table 1). This included animals from the 
classes Leptocardii (lancelets), Agnatha (jawless fish), Chondrichthyes 
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Fig. 2 | Evolutionary history of 17 major 
vertebrate-specific virus families or genera. 
Each phylogenetic tree was estimated using a 
maximum likelihood method, and is rooted 
using the corresponding broader scale tree 
that contains both vertebrate and invertebrate 
viruses (not shown). Within each phylogeny, 
the viruses newly identified here are marked 
with solid black circles. Host groups are 
indicated with different colours; mammals 
(red), birds and reptiles (yellow), amphibians 
(green), lungfish (deep blue), ray-finned fish 
(blue), cartilaginous fish (purple) and jawless 
fish (grey). The name of the virus family or 
genus is shown above each phylogeny, and the 
lower-order virus taxonomy is shown to the 
right when applicable. 


@ Viruses discovered 
in the current study 


(cartilaginous fish), Actinopterygii (ray-finned fish), Sarcopterygii 
(lungfish), Amphibia (frogs, salamanders and caecilians) and Reptilia 
(snakes, lizards and turtles). We extracted total RNA from the gut, liver, 
and lung or gill tissue of these animals, which was then organized into 
126 libraries for high-throughput RNA sequencing (Supplementary 
Table 1). In total, we generated 806 billion bases of sequence reads that 
were assembled and screened for RNA viruses. Despite the very large 
number of viruses discovered, we focused on vertebrate-associated 
viruses, including vertebrate-specific viruses that exhibited relatively 
close evolutionary relationships to known virus families or genera 
thought to infect only vertebrate hosts, and ‘vector-borne’ viruses that 
are able to infect both vertebrate and invertebrate hosts (Supplementary 
Table 2). In the resultant phylogenies, the newly discovered viruses 
either grouped within these families/genera, or fell as immediate sister- 
groups (Extended Data Figs. 1 and 2). Because the host spectrum of 
the vertebrate-specific virus families or genera is relatively restricted” 
and generally does not contain viruses associated with other host 
types}, we assume that vertebrates were their principle hosts, rather 
than any eukaryotic or prokaryotic microorganisms also present in 
the samples. Furthermore, at least 24% of the viruses were recovered 
from different tissues from the same individual and hence are likely to 
cause systemic infection (Supplementary Table 2). This gives further 
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Fig. 3 | Long-term evolutionary relationships between vertebrate 

hosts and their associated viruses. a, b, Comparisons of hepatovirus 

(a) and influenza virus (b) phylogenies and their corresponding host 
phylogenies are presented as examples of virus-host co-divergence and 
host-switching, respectively. c, Estimation of co-phylogenetic events across 
the history of vertebrate-associated RNA viruses. Each boxplot illustrates 
the estimated median (centre line), upper and lower quartiles (box limits), 
1.5 x interquartile range (whiskers), and outlier (points) of the 
co-divergence (red), duplication (blue), host-switching (green) and 
extinction (brown) events. Data from each estimation (hollow circles) 

are shown as overlays if there are less than 10 ‘solutions’ provided. 


support to a direct association within the vertebrate-host in which 
they were sampled. 

In total, we identified 214 distinctive and previously undescribed 
putative viral species of vertebrates, of which 196 can be considered 
vertebrate-specific (Fig. 1b, Supplementary Table 2). Hence, these data 
reveal that RNA viruses are present in greater numbers and diversity 
in vertebrates other than birds and mammals than previously realized 
(Fig. 1c). In particular, it was notable that every vertebrate-specific viral 
family or genus known to infect mammals and birds is also present in 
amphibians, reptiles or fish (Fig. 1d). For most of the families or genera, 
the previously known hosts were either mammals (the Arteriviridae, 
Filoviridae, Hantaviridae and rubivirus) or mammals, birds and reptiles 
(Arenaviridae, Astroviridae, Bornaviridae, Coronavirinae, influenza 
virus and rotavirus). This is the first time, to our knowledge, that these 
viral groups have been identified in fish and/or amphibians (Fig. 1d). 
Particularly notable was the presence of divergent members of the 
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Arenaviridae, Filoviridae and Hantaviridae families in ray-finned fish, 
suggesting that these previously mammal-dominated groups have rela- 
tives in aquatic vertebrates (Fig. 2). Similarly, for those virus groups 
previously known to contain fish viruses (Caliciviridae, Hepeviridae, 
Paramyxoviridae and Picornaviridae), we were able to greatly expand 
their genetic diversity, which now covers more phylogenetic space 
than in their mammalian counterparts (Fig. 2). Of particular note 
was influenza virus, for which we documented new viruses in jawless 
fish (hagfish), amphibians (Asiatic toad) and ray-finned fish (spiny 
eel), with the latter forming a sister-group to human influenza B 
virus (Fig. 2). Finally, it was notable that the viruses that were newly 
described in reptiles, amphibians and fish exhibited similar tissue 
tropisms as their mammalian counterparts”, which again argues for 
their antiquity. For example, among the viruses discovered here, those 
of the Hepacivirus genus were mainly found in the liver, whereas 
members of the Picornaviridae, Caliciviridae and Astroviridae families 
dominate in the gut (Fig. le). 


Long-term virus-host evolutionary relationship 

On the basis of the distribution of host taxa on the virus tree, these 
data also revealed that virus phylogenetic history can mirror that of 
their hosts over long evolutionary timescales. Most notably, viruses 
from fish tend to fall basal to viruses in amphibians, reptiles, birds 
and mammals, reflecting their divergent phylogenetic position within 
vertebrates (Figs. 2 and 3a). This was supported by the observation that 
the virus phylogenies exhibited significant clustering by host taxonomy 
(that is, class), with P <0.001 in the association index (AI)!° for all 
family and genus comparisons, with the exception of influenza virus 
and rotavirus (Table 1). However, despite this overall host clustering, 
these data also revealed many examples of host-switching during virus 
evolutionary history. For example, the influenza virus identified in ray- 
finned fish was the closest relative of mammalian influenza B virus 
(76% amino acid identity), and the influenza viruses sampled from 
other tetrapods was more divergent (approximately 30-62% amino 
acid identity; Fig. 3b). Similarly, the viruses identified in lungfish (in 
Picornaviridae, hepacivirus and aquareovirus; Fig. 2) were more closely 
related to those from ray-finned fish than to those from tetrapods with 
whom they share a more recent common ancestor'!. 


Table 1 | Phylogenetic test of virus—host association and co-divergence 


Test of host structure 
at the level of 


vertebrate class Test of virus—host co-divergence 


P value 

Association Number — (no. of 
Virus group index ratio* Pvalue (Al) Co-divergences_ ofcosts costs) 
Arenaviridae 0.0000 <0.001 10-12 27 <0.0 
Arteriviridae 0.4960 0.047 13 11 <0.0 
Astroviridae 0.0878 <0.001 17-21 68 <0.0 
Bornaviridae 0.0020 <0.001 4-5 10 0.05 
Caliciviridae 0.2736 <0.001 12-13 42 <0.0 
Coronavirinae 0.0834 <0.001 11-13 37 <0.0 
Filoviridae 0.0017 <0.001 3 6 0.06 
Hantavirus 0.0022 <0.001 12-17 22 <0.0 
Hepacivirus 0.0002 <0.001 13-15 23 <0.0 
Hepeviridae 0.2935 <0.001 4-6 12 0.13 
Influenza virus 0.9173 0.65 2 8 0.8 
Orthoreo- and 0.1015 <0.001 7 18 <0.0 
aquareovirus 
Paramyxoviridae 0.0231 <0.001 20-22 44 <0.0 
Picornaviridae 0.0294 <0.001 14-15 122 <0.0 
Rotavirus 0.9275 0.34 3-5 7 0.52 
Torovirinae 0.0072 <0.001 6 7 <0.01 


The association index (Al) ratio is calculated as ‘observed association index/null association 
index’, in which the null association index is derived from 1,000 tree-tip randomizations. A ratio 
closer to 0 indicates a stronger host structure. The ‘P values (Al)’ are outcomes from a Bayesian 
tip-association significance test (BaTS)!°, and derived from 1,000 tree tip randomizations without 
adjustment for multiple comparisons. The cost, that is, non-co-divergence, scheme included 
‘host-switching’, ‘host duplication’, ‘host loss’ and ‘failure to diverge’ events, as specified in the 
model. The ‘P values (no. of costs)’ are outcomes from a co-phylogeny test!, and are derived 
from 100 tip-mapping randomizations without adjustment for multiple comparisons. 
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We next performed a more rigorous co-phylogenetic analysis!*"? of 
the resemblance between the virus and host phylogenies at the species 
level. This revealed significantly more virus—host co-divergence than 
expected by chance alone (Table 1). However, these data also clearly 
show that host-switching has been commonplace during the evolution- 
ary history of vertebrate RNA viruses, and is often more frequent than 
co-divergence across the phylogenies as a whole (Fig. 3c). Aside from 
phylogenetic position, host-switching is also suggested by the observa- 
tion that single viruses are occasionally associated with multiple host 
species or even multiple host orders (such as Beihai fish astrovirus 1 
and Wenling fish picornavirus 1; Supplementary Table 2). Collectively, 
these results suggest that there is a long-term association between the 
RNA viruses and their vertebrate hosts that stretches many millions of 
years, but that cross-species transmission has occurred frequently on 
this background of co-divergence. 

To better determine the co-divergence history, we examined the tem- 
poral congruence between virus and host evolutionary histories'*’°. 
As the large genetic distances between these viruses preclude molec- 
ular clock-based studies using heterochronous sequences!®!7, amore 
profitable approach involves the comparison of exogenous viruses and 
their endogenous relatives'®. Previous studies have identified several 
dating calibration points in the Filoviridae!? and Bornaviridae'®”° 
families based on the presence of orthologous copies of endogenous 
virus elements (EVEs) in the genomes of related mammalian species 
with known times of divergence. Importantly, the viruses newly dis- 
covered here in ray-finned fish greatly expand the diversity in both 
the Bornaviridae (Fig. 4a) and Filoviridae (Fig. 4b). As a result, both 
the EVE clades and the calibration points (50 million years (Myr) ago 
and 30 Myr ago for the Bornaviridae and Filoviridae, respectively)'*" 
were now deeply nested within the diversity of exogenous viruses, with 
phylogenetic positions that were relatively distant from the root of the 
tree. This suggests that both viruses have ancient evolutionary histories 
that extend well beyond the calibration dates. Although no orthologous 
EVEs were found in the positive-sense and double-stranded RNA virus 
families studied here, that their (exogenous) protein sequence diver- 
gence was comparable to that of the Bornaviridae and Filoviridae is also 
compatible with long evolutionary histories. 


Additional vertebrate-associated viruses 

We discovered two potentially new groups of vertebrate-associated 
viruses: one distantly related to the Astroviridae and Potyviridae families, 
and another nested within the newly characterized Chuvirus group”! 
(Extended Data Fig. 3). Several pieces of evidence support the associ- 
ation of these viruses with vertebrates: (i) they appear in several tissue 
types (gut, gill and liver), indicative of systemic infection (Extended 
Data Fig. 3); (ii) a search of the Transcriptome Shotgun Assembly 
(TSA) sequence database revealed that related viral sequences 
were found only in vertebrate transcriptomes, again involving 
several tissue types (Extended Data Fig. 3); and (iii) in the case of the 
vertebrate-associated chuviruses, EVEs were found in the genomes of 
several species of ray-finned fish (Extended Data Fig. 3). 

In addition to the vertebrate-specific viruses, we discovered viruses 
in amphibians, fish and reptiles from genera that have previously been 
associated with vector-borne virus transmission, most notably alpha- 
viruses, dimarhabdoviruses and flaviviruses. Among these, Wenzhou 
shark flavivirus is the first member of the Flavivirus genus identified 
in cartilaginous fish, and was found in all the tissue types analysed 
compatible with a systemic infection (Supplementary Table 2). In 
the phylogeny, Wenzhou shark flavivirus falls basal to the ‘classic 
vector-borne and insect-specific flaviviruses, and was more closely 
related to Tamana bat virus that has no known vector species (Extended 
Data Fig. 4). Similarly, in the case of the alphaviruses and dimarhabdo- 
viruses, the fish viruses discovered here clustered with other fish viruses 
reported previously to form lineage(s) basal to those associated with 
vector-borne viruses (Extended Data Fig. 4). This complex mix of vec- 
tored and non-vectored viruses, with clear cases of the secondary loss of 
vector-borne transmission (Extended Data Fig. 4), raises the question 
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Fig. 4 | Evaluating the timescale of vertebrate virus evolution using 
EVEs. a, b, Phylogenies were based on the exogenous and endogenous 
nucleoproteins for the Bornaviridae (a) and Filoviridae (b) families. 
Within the trees, endogenous virus elements (EVEs) are highlighted with 
blue triangles, and the (divergent) exogenous viruses discovered in fish 
are highlighted with green squares. The nodes that represent orthologous 
clusters are highlighted with red circles, their associated divergence times 
are shown next to the nodes, and their corresponding names are given to 
the right of the phylogeny. 
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Fig. 5 | Evolution of vertebrate-associated virus genomes. Representative 
genomes from 12 vertebrate-associated virus families or genera are shown. 
The regions that encode major functional proteins or protein domains are 
labelled on each of the genomes. Homologous regions within or between 
viral families are connected by orange dotted lines. Host associations 

are labelled to the right of each genome using solid circles with different 


of whether some of the vector-borne viruses were ultimately derived 
from vertebrate-specific or vector-specific viruses, or if the ability to 
infect both arthropods and vertebrates is the ancestral phenotype”. 


Genome evolution of vertebrate RNA viruses 

The annotation of the virus genomes newly documented here 
showed a wider variety of genome architectures for vertebrate virus 
families or genera than previously observed’, some of which may 
represent the ancestral types in the evolutionary history of these 
viruses (Fig. 5). Although the structures of these vertebrate virus 
genomes were more conserved than those of invertebrates)®7!23, 
they still exhibited extensive variation, including genome 
length (hepacivirus), the organization of open reading frames 
(Caliciviridae), the complete re-configuration of the genome 
downstream of the non-structural genes (Arteriviridae), changes 


colours. The orientation of the positive-sense genomes are shown from 
5/ to 3’, those of negative-sense genomes are from 3’ to 5’, and those of 
ambisense genomes (that is, arenaviruses) are indicated using arrows. 
More detailed depictions of genome evolution are presented in Extended 
Data Figs. 5 and 6. 


in the order and number of glycoproteins (Paramyxoviridae), 
inter-species re-assortment involving the M segment (hantavi- 
rus), inter-family recombination involving the capsid protein 
(Astroviridae and Hepeviridae)** and changes in segment num- 
bers in the Arenaviridae family (Fig. 5, Extended Data Figs. 5 and 
6). The latter is particularly interesting as the Arenaviridae were 
traditionally thought to be a family of bi-segmented negative-sense 
RNA viruses”. However, we discovered two arenavirus species in 
fish with genomes comprising three segments, similar to that of the 
divergent relative of the Arenaviridae family found in arthropods”, 
and suggesting that there was a decrease in segment numbers 
from three to two (Extended Data Fig. 5). If so, this represents 
an important example of a reduction in segment number without 
a corresponding loss in gene content, hence compatible with segment 
merging. 
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Discussion 

Despite a combination of rapid evolution and frequent host-switching, 
our large-scale analysis of virus diversity in previously undersampled 
hosts suggests that RNA viruses in vertebrates tend to broadly follow 
the evolutionary history of their hosts that began in the ocean and 
extends for hundreds of millions of years. These results, which apply to 
most of the vertebrate RNA virus families or genera, are in accord with 
recent analyses of viral evolution using palaeovirological data!*-?%>6, 
and demonstrate the importance of conducting widespread taxonomic 
surveys of virus diversity when trying to reveal evolutionary history. 
These results also have broader implications for our understanding of 
virus evolution. In particular, it is clearly simplistic and perhaps erro- 
neous to identify a specific host group as ancestral to another given that 
our sampling of RNA virus diversity is still so very limited. For example, 
on current data we suggest that it is premature to conclude that verte- 
brate RNA viruses necessarily originated in mosquitoes/ticks, since it is 
possible that the evolution of specific virus families may have followed 
that of the metazoans over an even longer period of co-divergence. In 
summary, our study reveals long-term virus—host relationships for each 
vertebrate-associated virus family that extend over geological times- 
cales, further illustrating the ancient history of RNA viruses. 
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METHODS 


Sample collection. The goal of this study was to survey animal species that were 
representative of biological diversity within the phylum Chordata, and that have 
only rarely been analysed for the presence of RNA viruses. Accordingly, we focused 
on amphibians, reptiles and fish rather than birds and mammals that have been 
studied in far greater detail (Fig. 1d). We also targeted species distributed at diverse 
locations across the vertebrate phylogeny (Fig. 1a), although those species associ- 
ated with most basal vertebrate lineages are often rare. For each species we sampled 
1-24 individuals to represent a population. No statistical methods were used to 
predetermine sample size. The procedures for sampling and sample processing 
were approved by the ethics committee of the National Institute for Communicable 
Disease Control and Prevention of the Chinese CDC. 

In total, we sampled two species from the subphylum Cephalochordata (that 
is, lancelets), with the remainder from the subphylum Vertebrata (Supplementary 
Table 1, Fig. 1a). Within Vertebrata, we sampled two species each from the classes 
Agnatha (that is, jawless fish) and Sarcopterygii (that is, lungfish), as these are 
relatively rare. Most of our aquatic samples were from the classes Chondrichthyes 
(cartilaginous fish), from which we sampled 19 species, and Actinopterygii 
(bony fish), from which we sampled more than 130 species across 20 orders 
(Supplementary Table 1). With respect to land tetrapods, we sampled 12 species 
from the class Amphibia, including the orders Aura, Caudata and Gymnophiona, 
and 17 species from the class Reptilia, including the orders Testudines and 
Squamata (Supplementary Table 1). 

With the exception of lungfish samples, which were obtained from Nigeria 
(Protopterus annectens) and Chile (Lepidosiren sp.), respectively, all other samples 
were collected in China (Supplementary Table 1). The marine species were sampled 
from the South China Sea, East China Sea and Yellow Sea, mostly from fishing 
vessels. The samples were kept at —20 °C on the boat before being transferred to 
—80 °C for storage. The remaining marine samples were either collected frozen 
from the returned ships at the dock, or purchased alive from local fisherman at 
nearby markets. The freshwater fish samples were captured alive using fishing rods 
or nets from rivers and lakes in Hubei, Heilongjiang and Guangdong provinces. 
The reptile and amphibian samples were caught by field biologists from a wide 
range of geographic locations, including Fujiang, Guangdong, Guangxi, Xinjiang 
and Zhejiang provinces. 

For most of the animal samples, three types of internal organs were harvested, 
comprising the gut, liver and gill for jawless, cartilaginous, and ray-finned fish, 
and gut, liver and lung for amphibians and reptiles (Supplementary Table 1). For 
lungfish, all four types of tissue (that is, gut, liver, lung and gill) were obtained. 
For lancelets, the entire individual was used owing to their small body size. All 
specimens were stored at —80 °C for later RNA extraction. 

Host species information was initially identified by experienced field biologists 

on capture based on morphological traits, and was later confirmed by sequenc- 
ing and analysing the partial cytochrome c oxidase (COI) gene from each sample 
(approximately 600-700 nucleotides near 5’ of the gene). 
RNA library construction and sequencing. RNA was extracted from individ- 
ual animal specimens. For the initial screening of viruses, aliquots of RNA from 
several (that is, from 13 to 62) individuals of a particular taxonomic group or 
multiple taxonomic groups were pooled for library preparation and sequencing 
(Supplementary Table 1). After determining the presence of a specific virus, a 
subset of the initial pool or the individual un-pooled RNA extractions was 
subject to library construction and sequencing to obtain better genome coverage 
(Supplementary Table 1). 

For each RNA extraction, we first transferred approximately 30 mg from the 
specimen to 500-700 pl standard, sterile, RNA and DNA-free 1 x PBS solution 
(GIBCO). The tissue was then homogenized in the PBS solution using the Mixer 
mill MM400 (Restsch). Total RNA was extracted from the homogenates using 
TRIzol LS reagent (Invitrogen) and subsequently purified using RNeasy Plus Mini 
Kit (Qiagen). Aliquots of the resultant RNA solutions were then pooled in equal 
quantity. The quality of the pooled RNA was evaluated using an Agilent 2100 
Bioanalyzer (Agilent Technologies) before library construction and sequencing. 

The TruSeq total RNA Library Preparation protocol (Illumina) was used for 
all library preparations. Ribosomal (r) RNA was removed using the Ribo-Zero 
Gold (Epidemiology) Kit (Illumina) for most of the libraries, with the exception 
of LXMC-PolyA and XYHYMC-PolyA for which poly(A) enrichment was used 
(Supplementary Table 1). The average fragmentation size for these libraries was 
either 200 bp or 300 bp. Accordingly, 100 bp and 150 bp paired-end sequencing of 
the RNA libraries were performed on the Hiseq 2500 and HiSeq 4000 platforms 
(Illumina), respectively. All library preparation and sequencing was carried out 
by BGI Tech (Shenzhen). 

RNA virus discovery. For each library, sequencing reads were first adaptor- and 
quality-trimmed using the Trimmotic program’ with the following options: 
SLIDINGWINDOW:4:5, LEADING:5, TRAILING:5, MINLEN:25. The remain- 
ing reads were assembled de novo using the Trinity program (version 2.1)”8 with 
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default parameter settings. To identify viral contigs, the assembled contigs were 
compared (using blastx) against the database comprising reference RNA virus 
proteins downloaded from GenBank. The E-value cut-off for these comparisons 
was set at 1 x 10~>. To eliminate false positives, these putative viral contigs were 
compared against the entire non-redundant nucleotide and protein database. The 
remaining contigs with unassembled overlaps were merged to form longer viral 
contigs using the SeqMan program implemented in the Lasergene software package 
(version 7.1, DNAstar). 

Among all the virus contigs discovered, those likely to be associated with 
vertebrates (that is, vertebrate-specific viruses and vector-borne vertebrate viruses) 
were initially identified based on a closer relationship to established vertebrate- 
associated viruses than to other taxa in a Blast analysis (that is, known vertebrate- 
associated viral families/genera were the top blast hits). This relationship was later 
confirmed by more detailed phylogenetic analyses including viruses representative 
of both vertebrates and a wider variety of non-vertebrate organisms”*”! (Extended 
Data Figs. 1 and 2). 

For the vertebrate-associated viruses, we determined which samples contained 
the viruses and hence its potential host(s) using PCR with reverse transcription 
(RT-PCR) and sequencing. Accordingly, for each virus, we designed 2-3 pairs of 
primers based on the viral contigs and screened all the unpooled RNA extractions 
of the corresponding library. The target PCR products were then validated by 
Sanger sequencing. 

Gaps in incomplete vertebrate-associated virus genomes were filled by either 

RT-PCR or by re-sequencing (using the meta-transcriptomics approach described 
above) on the individual RNA samples that contained the target virus. Genome 
termini were determined by RNA circularization as previously described”, or by 
using the 5'/3’ RACE kits (TaKaRa). Confirmation of most of the viral genome 
sequences was performed by read mapping using Bowtie2”’, with the final major- 
ity consensus sequences determined from the final assembly of mapped reads 
using Geneious v.8°°. For virus species with multiple variants in the same pool, 
we performed meta-transcriptomics or RT-PCR and Sanger sequencing of 
the entire genome from the individual positive sample. To exclude the possibility 
that these contigs belonged to expressed EVEs (see below), we used PCR and 
Sanger sequencing to examine the DNA extracted from the homogenates of the 
corresponding samples. 
Searching existing databases for vertebrate viruses. To discover more vertebrate- 
associated viruses and hence enrich our dataset, we downloaded the entire 
Transcriptome Shotgun Assembly (TSA) sequence database which was then used 
as query to search against the virus protein database as previously described. 
Because not all transcriptome sequences have a TSA (assembled) entry, we also 
examined reads deposited in the Sequence Read Archive (SRA) database. We tar- 
geted basal vertebrate taxa with inadequate or limited sampling, including lancelets 
(NCBI taxonomy ID: 7736), jawless vertebrates (NCBI taxonomy ID: 1476529), 
cartilaginous fish (NCBI taxonomy ID: 7777), and lungfish (NCBI taxonomy ID: 
7878). These reads were assembled using Trinity and compared against the virus 
protein database as described above. Unfortunately, no vertebrate-associated 
viruses were found in these read archives. 

To reveal viruses that may have infected vertebrates in the evolutionary past, we 

searched within the vertebrate genomes for EVEs that were relatively closely related 
to the viruses discovered in this study, especially those that did not belong to any 
established vertebrate clade. Accordingly, we first downloaded all the assembled 
genome sequences within the taxonomic group Vertebrata (NCBI taxonomy ID: 
7742) from the NCBI genome FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/). 
We then compared the translated viral protein sequences discovered in this study 
against the all assembled vertebrate genomes using the tblastn program, with an 
E-value cut-off set at 1 x 10~7°. For each potential EVE, the query process was 
reversed to determine their phylogenetic positions. The alignment of EVEs and 
exogenous viruses was checked manually to exclude false-positives. 
Virus genome characterization. For newly identified virus genomes, the pred- 
ication of the potential open reading frames (ORFs) was based on those from 
the related reference virus genomes. The annotation of ORFs was first based on 
comparisons against the Conserved Domain Database (CDD) and then against 
the non-redundant protein database. The remaining proteins were characterized by 
predicting their primary protein structure using the programs NetNGlyc, SignalP, 
and TMHMM (http://www.cbs.dtu.dk/services/). For example, the divergent glyco- 
protein genes of negative-sense RNA families were identified based on the presence 
of (i) an N-terminal signal peptide, (ii) a mid-point or C-terminal transmembrane 
domain, and (iii) putative N-linked glycosylation sites. Finally, the sequencing 
depth of each viral genome within the library was estimated based on the percent- 
age of total reads that mapped to the target genome. 

In the case of segmented viruses, most of the non-RdRp segments were recov- 
ered by homology comparisons. However, divergent members of the families 
Hantaviridae and Arenaviridae had glycoproteins that lacked clear homology with 
those of other family members. To look for these segments we first annotated 
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all contigs that were of similar sequencing depths by comparing them to the nr 
database. This removed most sequences of host origin. For the remaining con- 
tigs, we examined (i) the potential glycoprotein structure (that is, signal peptide, 
transmembrane domains and glycosylation sites), (ii) the presence of inverted 
complementary genome termini that are the same to those of other segments, 
(iii) whether all the segments were found in the same samples and (iv) whether its 
closest relative contained the related segment. Only when all four criteria were sat- 
isfied did we conclude that these segments most likely belonged to the same virus. 
Inferring virus evolutionary history. We examined the phylogenetic relationship 
among these viruses at two levels: (i) an overall evolutionary history that placed the 
vertebrate-associated viruses in the context of viruses sampled from other hosts, 
and (ii) family/genus specific phylogenies that provide a more detailed depiction 
of the evolutionary relationships within each of the vertebrate-associated virus 
families/genera. At the family/genus level, we included as background all reference 
virus replicase sequences (that is, RNA-dependent RNA polymerase; RdRp) as 
well as replicases from non-reference viruses that occupied a unique phylogenetic 
position and which had an established host association. At the overall level, we 
included viral replicases representative of a broader phylogenetic diversity”! in 
addition to those used in the family/genus level phylogenies. 

For each dataset, the virus replicases were aligned using the E-INS-i algorithm 

implemented in the program MAFFT (version 7)*", with all ambiguously aligned 
regions were subsequently removed using TrimAl (version 1.2)**. The best-fit 
model of amino acid substitution in each dataset was determined using ProtTest 
(version 3.4)**. Phylogenetic trees were then inferred using the maximum likeli- 
hood method implemented in PhyML (version 3.0)*4, using the best-fit substitution 
model and Subtree Pruning and Regrafting (SPR) branch-swapping. Support for 
specific nodes on the trees was assessed using an approximate likelihood ratio 
test (aLRT) with the Shimodaira-Hasegawa-like procedure. In addition, phylo- 
genetic trees were inferred using the Bayesian method implemented in the program 
MrBayes v.3.2*°, using the same amino acid substitution models. Because the tree 
topologies generated by the two programs were largely identical, only maximum 
likelihood phylogenies are shown here. 
Examining virus-host evolutionary relationships. We used the BaTS (Bayesian 
tip-association significance testing) program” to test whether viruses cluster more 
strongly with particular host taxonomic groups than expected by chance alone. 
This analysis considered host phylogenetic structure at the level of vertebrate class: 
that is, mammals, reptiles and birds, amphibians, lungfish, bony fish, cartilaginous 
fish, and jawless fish. Accordingly, we estimated the association index!? within 
BaTS to determine the strength of the association between virus phylogeny and 
host class. This was then compared to a null distribution generated using 1,000 
replicates of state randomization across a credible set of trees generated by MrBayes 
as described above. 

To examine the extent of virus-host co-divergence in each vertebrate-specific 
virus family/genus, we performed event-based co-phylogenetic reconstructions 
using the Jane program (version 4)'”. The virus phylogenies were based on the 


family/genus level phylogenies estimated here, from which we removed those with 
no host information. All ‘generalist’ viruses (that is, those that infect more than 
three species of hosts) were included in the analyses as unresolved parallel lineages. 
The corresponding host topologies were obtained from both the TIMETREE web- 
site (http://www.timetree.org/) and a previous phylogeny of bony fish*”. The ‘cost’ 
scheme for analyses in Jane was set as follows: co-divergence = 0, duplication = 1, 
host switch = 1, loss = 1, failure to diverge = 1. The number of generations and 
the population size were both set to 100. The significance of co-divergence was 
derived by comparing the estimated costs to null distributions calculated from 100 
randomizations of host tip mapping. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. All sequence reads generated in this study are available at the 
NCBI Sequence Read Archive (SRA) database under the BioProject accession 
PRJNA418053 (Supplementary Table 1). All viral sequences generated in this 
study have been deposited in GenBank under the accession numbers MG599863- 
MG600130 (Supplementary Table 2). All virus nucleotide sequences (fasta format), 
the unaligned and the aligned data set used in the phylogenetic analyses (fasta format), 
as well as the phylogenetic trees (newick and MEGA5 mts format), are available 
at the Figshare website at: https://Figshare.com/articles/The_evolutionary_ 
history_of_vertebrate_RNA_viruses/5405620. 
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Extended Data Fig. 1 | Phylogenetic positions of vertebrate-associated 
positive-sense and double-stranded RNA viruses within the broader 
diversity of RNA viruses. Phylogenies were estimated using a maximum 
likelihood method and midpoint-rooted for clarity only. Viruses 
discovered here are labelled with solid black circles. The name of the major 
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clade (phylogeny) is shown at the top of each tree, and taxonomic names 
are shown to the right. The vertebrate associated virus diversity is shaded 
in grey. All horizontal branch lengths are scaled to the number of amino 
acid substitutions per site. 
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Extended Data Fig. 2 | Phylogenetic positions of vertebrate-associated 
negative-sense RNA viruses within the broader diversity of RNA 
viruses. Phylogenies were estimated using a maximum likelihood method 
and midpoint-rooted for clarity only. Viruses discovered here are labelled 
with solid black circles. The name of the major clade (phylogeny) is shown 


at the top of each tree, and taxonomic names are shown to the right. 

The vertebrate associated virus diversity is shaded in grey. All horizontal 
branch lengths are scaled to the number of amino acid substitutions 

per site. 
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Vertebrate-associated astro-like viruses 


Vertebrate-associated Chuviruses 


WZRBX33387 Wenzhou crab virus 2 
WLJQ104251 Wenling crustacean virus 13 
ZL15189 Wuchang Cockroach Virus 3 
spider122673 Hubei chuvirus-like virus 2 
mosHB234429 Wuhan Mosquito Virus 8 
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Extended Data Fig. 3 | The phylogenies of potentially new families black diamonds, while those recovered from the Whole-Genome Shotgun 
of vertebrate-associated viruses. Viruses identified from vertebrate (WGS) contigs database (that is, endogenous virus elements) are marked 
hosts are shaded with different colours. Sequences recovered from the 


with open triangles. For vertebrate viruses, the relevant taxonomic and 
Transcriptome Shotgun Assembly (TSA) database are marked with solid tissue information is provided in the sequence names. 
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Extended Data Fig. 4 | Evolutionary history of four groups of vector- 
borne RNA virus. Each phylogenetic tree was estimated using a maximum 
likelihood method. Within each phylogeny, the viruses newly identified 
here are marked with solid black circles, the vertebrate host groups are 
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indicated by different colours, and the vector symbol is shown next to 
viruses known to be transmitted by vectors. The name of the virus family 
or genus is shown at the top of each phylogeny, and the lower level virus 
taxonomic names are shown to the right. 
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Extended Data Fig. 5 | Evolution of vertebrate-associated negative- 
sense RNA virus genomes. Representative genomes from negative- 
sense RNA virus families/genera are shown. The regions that encode 
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connected by orange dotted lines. Host associations are reflected in the 
colour of the virus names. Host association colour schemes and the 
abbreviations of functional domains are described at the bottom of the 
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Cryo-EM structure of the Blastochloris 
viridis LH1-RC complex at 2.9 A 


Pu Qian!*, C. Alistair Siebert”, Peiyi Wang’, Daniel P. Canniffe! & C. Neil Hunter!* 


The light-harvesting 1-reaction centre (LH1- RC) complex is a key functional component of bacterial photosynthesis. Here 
we present a 2.9 A resolution cryo-electron microscopy structure of the bacteriochlorophyll b-based LH1-RC complex 
from Blastochloris viridis that reveals the structural basis for absorption of infrared light and the molecular mechanism of 
quinone migration across the LH1 complex. The triple-ring LH1 complex comprises a circular array of 17 8-polypeptides 
sandwiched between 17 a- and 16 y-polypeptides. Tight packing of the ~-apoproteins between 8-polypeptides collectively 
interlocks and stabilizes the LH1 structure; this, together with the short Mg-Mg distances of bacteriochlorophyll b pairs, 
contributes to the large redshift of bacteriochlorophyll b absorption. The ‘missing’ 17th y~-polypeptide creates a pore 
in the LHI ring, and an adjacent binding pocket provides a folding template for a quinone, Qp, which adopts a compact, 
export-ready conformation before passage through the pore and eventual diffusion to the cytochrome bc; complex. 


Photosynthesis provides the energy for almost all life on Earth. In 
the early stages of photosynthesis, light-harvesting complexes absorb 
solar energy, which is transferred to a membrane-bound RC, where a 
charge separation initiates the eventual formation of a reduced elec- 
tron acceptor. The basic functional unit in purple phototrophic 
bacteria is LH1-RC, the complex of LH1 and the RC, in which the 
RC is surrounded by a ring-like oligomeric assembly of LH1 a- and 
8-heterodimers that bind bacteriochlorophyll (BChl) and carotenoid 
pigments. LH1-RC complexes in different species exhibit a variety 
of architectures: 16 LH1 a-( pairs completely encircle the RC in the 
Thermochromatium tepidum* and Rhodospirillum rubrum? com- 
plexes, whereas in Rhodopseudomonas palustris the RC is encircled by 
an open LHI ring consisting of 15 «-( pairs and a W polypeptide’. 
The Rhodobacter sphaeroides complex has a dimeric core’, in which 
each monomer has 14 a-( pairs associated with one RC; two mon- 
omers associate through two PufX polypeptides to form an S-shaped 
LHI ring®®. 

A high level of structural detail is required to account for the ability 
of LH1-RC complexes to absorb solar energy within a specific spectral 
range and to drive the formation of a quinol, which must traverse the 
confines of the LH1 ring encircling the RC. We identified the LH1- 
RC complex from Blastochloris (Blc.) viridis as a suitable target for a 
high-resolution structural study because it possesses unique archi- 
tectural and spectroscopic features. Notably, the RC in the Ble. viridis 
photosynthetic complex yielded the first reported structure of a mem- 
brane protein complex”, but electron microscopy has provided only 
low-resolution structures for the complete LH1-RC complex! . This 
complex accommodates BChl b rather than BChl a, and absorbs in the 
infrared at 1,015nm, making it one of the most redshifted photosyn- 
thetic complexes described to date and one proposed as the basis for 
re-engineered photosynthesis!%. There is currently no known struc- 
tural basis for this unusual in vivo absorption, which represents one 
of the largest redshifts observed in a photosynthetic pigment—protein 
complex, 220nm from the 795 nm absorption maximum of BChl b in 
methanol. This property could be related to the composition of the 
Blc. viridis LH1 complex, which contains «-, 3- and \-polypeptides, 
but the position and function of the y-subunit within the LH1 


complex remain poorly understood. The Blc. viridis LH1 contains rare 
1,2-dihydro-derivatives of neurosporene and lycopene as major carote- 
noids'*"'®, The LH1-RC complex forms extensive arrays in the lamellar 
membranes of Bic. viridis!’-*°, which have been hypothesized to consist 
of closed 16-membered LHI rings that completely encircle each RC”. 
However, such an arrangement of LH1 subunits represents a potential 
obstacle for quinol export from the RC to the external quinone pool in 
the membrane and eventual reduction of the cytochrome bc, complex. 
Here, we report a 3D cryo-electron microscopy (cryo-EM) structure 
at 2.9 A resolution of this BChl b-based photosynthetic complex from 
Blc. viridis. Analysis of this structure shows how 4-apoproteins influ- 
ence the large redshift observed in the BChl b-Q,-absorption band, 
reveals the position of the internal quinone channel, and identifies a 
third quinone binding site that prepares quinol for export through the 
pore in the LH1 ring. 


Overall structure of the LH1-RC complex 
Extended Data Fig. 1 shows the absorption spectra of native photosyn- 
thetic membranes and LH1-RC complexes purified from Blc. viridis. 
The absorption maximum at 1,015 nm is ascribed to the Qy band of 
BChl b in the LH1 complex, which is slightly blueshifted to 1,008 nm 
after detergent solubilization and purification. Following vitrification 
of monodisperse complexes, we recorded 6,472 cryo-EM movies, from 
which 267,726 particles were picked manually for reference-free 2D 
classification. Further processing yielded a final resolution of 2.9 A, ena- 
bling compilation of a colour-coded electron-density map (Fig. la—c) 
that reveals the detailed structural architecture of this LH1-RC com- 
plex and the relative locations of all pigments, cofactors and subunits. 
The dimensions of the LH1-RC are shown in Fig. 1c, d. The height of 
the core complex from the top of the periplasmic cytochrome to the 
bottom of the H subunit on the cytoplasmic side is 128.9 A (Fig. 1a, d), 
and the diameters of this structure, which is slightly elliptical in pro- 
jection, are 120.2 and 124.5 A (Fig. 1c). The complex has a molecular 
weight of 414kDa. 

The RC in the cryo-EM map is similar to the one in the X-ray 
structure (for example, PDB: 1PRC)*!. The RC consists of H, M, L 
and cytochrome (C) subunits. Structural differences, indicated by 
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Periplasmic side 


Fig. 1 | Cryo-EM structure of the LH1-RC core complex from 

Blc. viridis. a—c, Views of the colour-coded LH1-RC density map. LH1-a 
(yellow), LH1-6 (dark blue), LH1-7y (red), BChl b (light sea green), 
carotenoid (orange red), RC-C (green), RC-H (cyan), RC-L (orange) 

and RC-M (magenta). Detergent and other disordered molecules are in 
grey. a, View in the plane of the membrane; two dashed lines indicate the 


residue-residue distance deviation”’, are small in subunits C, M 
and L (Extended Data Fig. 2b-d). However, interaction with the 
LH1 complex constrains a loop region on RC subunit H (RC-H; 
residues 47-54), resulting in a larger deviation from the RC-only 
structure (Extended Data Fig. 2a, e). There is also a small displace- 
ment of subunits RC-C and RC-H, which is likely to be caused by 
interaction with the LH1 complex, which bends the RC via a hinge 
point near the interface between the RC-C and RC-M-RC-L sub- 
units (Extended Data Fig. 2a). The LH1 complex encircles the 
RC, including subunits C, H, M and L, the structures of which 
are in agreement with previous studies” (Fig. 1b, c, e, Extended 
Data Fig. 2). 

The LH1 complex surrounds the RC to form a closed elliptical 
ring. The lengths of the major and minor axes of the elliptical ring 
measured from centre to centre of the transmembrane helices are 
75.2 and 78.7 A for the a-ring, 107.5 and 111.7A for the B-ring, and 
109.6 and 114.8 A for the +-ring, respectively. The LH1 ring consists 
of 17 components, with 16 heterotrimers of a-B-7-polypeptides 
and one a-($-heterodimer (Fig. 1c, f). Each a-, B- and y-polypeptide 
contains a single transmembrane helix. A short N-terminal helix in 
a-polypeptides runs parallel to the membrane surface, whereas the 
C-terminal region contains a loop structure. No helical structures are 
observed in the C- and N-terminal regions of the 3-polypeptide. The 
N termini of the a- and B-polypeptides are on the cytoplasmic side 
of the membrane, but the y-subunit has the opposite topology, with 
its N terminus on the periplasmic side (Extended Data Fig. 3). This 
arrangement of LH1 polypeptides creates a triple-ring LH1 complex, 
consisting of an inner circle of 17 a-polypeptides, an outer ring of 
16 \-polypeptides, and a 17 8-polypeptide ring sandwiched between 
them (Fig. 1c, f). Each of the 16 )-polypeptides sits between two 
B-polypeptides, and the gap where the ‘missing’ 17th y-polypeptide 
would otherwise be located creates an opening in the LH1 ring 
(Fig. 1c, f) for quinol exchange. 
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likely position of the membrane bilayer. b, Forty-five-degree rotation of 
a. c, Perpendicular view from the periplasmic side. Densities outside the 
membrane region have been truncated for clarity. d-f, Ribbon models 
corresponding to a, b and c but without the truncations in f. The LH1 
subunits are numbered in f. Subunits 1 and 17 are outlined with dashed 
lines in f. 


Two BChl b molecules and one carotenoid, an all-trans-1,2- 
dihydro-derivative of neurosporene (n= 9) or lycopene (n= 11), 
are non-covalently bound between each a-{-pair, and no pigment 
molecules are bound to the \-polypeptide (Fig. 3). Major cofactors 
bound within the RC are as previously reported! with the addition of 
the ubiquinone-9 Qp (Fig. 2), and are arranged in the expected local 
pseudo-two-fold rotational symmetry (Fig. 2). 


Interactions that stabilize the LH] ring 

The cryo-EM model of the LH1-RC from Blc. viridis reveals a com- 
plicated interconnecting series of protein-protein, pigment-protein 
and pigment-pigment associations within the LH1 ring. For sim- 
plicity, the LH1 heterotrimer subunits 1, 2 and 3 are used to demon- 
strate the stabilizing intra- and inter-subunit interactions in the LH1 
complex. Inter-subunit hydrogen bonds on the periplasmic side are 
a(n)-Arg44 to B(n — 1)-Val55 (bond length 3.0 A); and B(n)-Arg44 to 
B(n — 1)-Ala48 (3.3.A) (Fig. 3a). There is an intra-subunit hydrogen 
bond between a-Arg44 on the periplasmic side and the carboxyl group 
of 8-Trp46 (3.0 A), which stabilizes the C-terminal loops of both the 
a- and 3-polypeptides (Fig. 3b). The +(1)-polypeptide forms two hydro- 
gen bonds with the a(n)- and 3(n)-polypeptides: y-Asp14 to B-Trp41 
(3.0 A) and -+\-Arg36 to the carboxyl group of «a-Thré6 (3.1 A) (Fig. 3b). 
Thus, an LH1 heterotrimer subunit is formed from a(n)-B(n)-y(n), 
and not a(n+ 1)-8(n+ 1)-\(n). This arrangement suggests an 
assembly sequence for the LH1 complex. It is likely that once an 
a(1)-B(1) subunit is formed, it interacts with RC-H to form an anchor 
point through the hydrogen bond between a(1)-Arg19 and RC-H- 
Ser256. Then, the \(1)-polypeptide binds to the a—6-subunit to form 
the first LH1 subunit a(1)-B(1)-1(1). To do so, the )-polypeptide needs 
space to access the a—8-subunit by rotating and translating to achieve 
the correct angle of approach and the correct orientation. This proce- 
dure continues until the 17th a-($-subunit is assembled. At this point, 
there is no space for a correct direction of approach and orientation that 
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Fig. 2 | Pigment arrangement in the Blc. viridis LH1-RC core complex. 
a, Pigment molecules viewed from the periplasmic side by tilting 45° 
in the plane of the membrane. b, RC pigment molecules viewed from 
the membrane plane. A local pseudo-C2 symmetry axis is shown as a 


would allow the 17th 1-polypeptide to dock with the 17th a-B-subunit, 
resulting in a gap in the LH] ring. 

The LH1-RC from Bic. viridis reveals the basis for the stabilizing 
effects of carotenoids, which mainly rely on hydrophobic forces, 
and for excitation energy transfer from carotenoids to BChls’°. 
Interactions of each carotenoid with n+ 1, n and n — 1 polypeptides 
and with bound BChls effectively crosslink one LH1 a-(-subunit to 
the next (Fig. 3c). One end of the carotenoid is in close proximity 
to the upstream neighbouring LH1 a(n + 1) near its C terminus 
(Phe37, 3.1 A; Leu33, 3.7 A; Ala32, 3.4 A; His36, 3.9 A); the other end 
approaches the downstream neighbouring LH1 a(n — 1) near its N 
terminus (Leul11, 4.3 A; Lys10, 5.1 A). In particular, this end of the 
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dashed line. Car, carotenoid; HEM, haem cofactors of the cytochrome 
subunit; SP, special pair of BChl b pigments; Fe, non-haem iron; BPhe b, 
bacteriopheophytin b. 


carotenoid is also in close proximity to the 8()-N terminus. The mid- 
dle part of the carotenoid is close to the phytyl tails of the a- (3.2 A) 
and B- (4.0A) BChl b molecules (Fig. 3c). 

Subunits 1-16 of the LH1 complex (Fig. 1f) consist of one each of 
a-, B- and 1-polypeptides, two BChl b molecules and one all-trans 
carotenoid. The +\-polypeptide has no histidine residue and does not 
bind BChl B. Figure 3b illustrates this point using subunit 3; a-His36 
forms a ligand with the central Mg of «-BChl b (2.5 A) and 8-His37 
forms a ligand with B-BChl b (2.2 A) (Fig. 3b, c). The C3-acetyl groups 
of a-BChl b and B-BChl b form a hydrogen bond with a-Trp47 (2.9 A) 
and B-Trp46 (2.9 A), respectively, to orientate the bacteriochlorin rings 
of BChl b. This orientation is further stabilized by a 3.0 A hydrogen 


Fig. 3 | Intra- and inter-subunit protein-protein and protein-pigment 
interactions. a, LH1 subunits 1-3 (Fig. 1f) illustrate inter-subunit 
interactions. Colours as in Fig. 1 except BChl b molecules in medium 
blue and all-trans-1,2-dihydroneurosporene in orange. Hydrogen bonds 
are indicated by dashed lines. b, A single LH1 a-$-+-subunit, with the 


polypeptides shown in loop representation for clarity. The red arrow 
indicates a putative direction of approach of the 1-polypeptide to the 
a-B-pair during assembly of the complex. c, Projection view to show 
interactions made by a carotenoid with nearby pigments and polypeptides. 
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Fig. 4 | Interactions between the RC and the LH1 complex, and within 
the LH1 complex. a, Periplasmic side of the LH1-RC core complex; colour 
coding as in Fig. 1. RC-H Ser256 and LH1-al Arg19 are highlighted 

using space-fill. The RC-H-loop Leu47 to Pro54 is highlighted in orange. 
b, Summary of intra- and inter-subunit interactions in the LH1 complex. 
Only transmembrane helices of LH1 polypeptides are shown for clarity. 

All arrows indicate hydrogen-bonding interactions. 
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bond between B-Tyr29 and the ester group of 8-BChl b on C13’. The 
OH group of 8-Tyr29 could form a hydrogen bond with the ester group 
on the phytyl tail of the a- or B-BChl b. 


LHI1-RC interactions 

The resolution of the cryo-EM structure of the LH1-RC complex is 
sufficient to enable detailed analysis of the protein-protein and pro- 
tein—pigment interactions within the complex. The protein-pigment 
interactions in the RC have previously been described in detail**, and 
the relationship between the RC and its encircling LH1 can now be 
defined. Figure 4a shows the overall organization of the LH1-RC 
complex, which is divided into three zones. Zone 1 (AOC) includes 
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a close contact between LH1 and the RC, a hydrogen bond between 
LH1-al-Arg19 and RC-H-Ser256 (2.8 A), which is likely to be the site 
for initiating encirclement by LH1 in a manner analogous to that in 
the LH1-RC-PufX complex of Rba. sphaeroides®. This trimeric a-3-) 
subunit is designated as LH1 subunit 1 (Fig. 1f). Proximity between 
transmembrane helix RC-L, and LH1-q2, with a centre-centre helix 
distance of approximately 10 A, could facilitate the encirclement pro- 
cess. A third interaction in this region involves LH1-a3 and LH1-a4 on 
the cytoplasmic side, which constrains a loop on RC-H (Leu47-Pro54). 
In zone 2 (COB) there is a single point of contact between the RC and 
LH1, between the RC-My, helix and LH1-a9. The gap between the RC 
and LH1 in this region is mainly filled by the single transmembrane 
helix of RC-H and lipid molecules (Extended Fig. 4a). Zone 3 (BOA) 
is where quinol-quinone exchange occurs at the RC Qg site, and where 
newly released quinols and quinones arriving from outside create a 
dynamic quinone pool®. Thus, the structure of the gap between the RC 
and LH1 in this region shows disordered densities arising from lipids 
and quinones” (Extended Data Fig. 4a). Figure 4b summarizes all intra- 
and inter-subunit protein-protein and protein—pigment interactions 
in the LH1 complex, and highlights the extent of the interactions that 
stabilize the LH1 complex. 


Structural basis for the redshift 

The LH1-RC complex of Bic. viridis is able to absorb energy in the 
infrared region of the spectrum owing to the unusually large redshift 
it imposes on the BChl b pigment; its 1,015-nm absorption maximum 
represents, to our knowledge, the lowest energy light used by a photo- 
synthetic bacterium. Previous studies have identified several influences 
on the redshift of the BChl a or BChl b-Q, absorption maximum in the 
bacterial light-harvesting complex» **. The cryo-EM structure of the 
Blc. viridis LH1-RC complex shows that at least five factors contribute 
to the large bathochromic shift of the BChl b-Q, band. 

The first factor is the chemical structure. The extra C-C double bond 
in BCh1 b relative to BChl a extends conjugation in the bacteriochlorin 
ring and redshifts the Q, band. The 795-nm absorption maximum of 
BChl b in methanol is 24nm further towards the near infrared than that 
of BChl a, which directly affects the ‘site energy within coupled BChl 
b aggregates in the LH1-RC complex. 

The second factor is protein-pigment interactions. As already noted 
(Fig. 3c), the carotenoids interlink LH1 o-B-subunits (Fig. 3c) and the 
C3 acetyl groups of «- and 8-BChls b hydrogen bond to Trp residues 
in LH1 (Fig. 3b), adopting an in-plane conformation similar those 
of the B800-850 BChls a in the LH2 complex of Rhodopseudomonas 
acidophila*” *8, Experiments combining mutagenesis and Raman 
spectroscopy have shown that hydrogen bonds redshift the absorption 
of the Rba. sphaeroides LH1 complex””*”. 

The third factor is the number of coupled BChl a and b molecules; 
the 17 pairs of coupled BChl b molecules in the LH1-RC complex 
of Bic. viridis represent the largest reported circular aggregate of pig- 
ments in light-harvesting complexes from photosynthetic bacteria*’. 
Increasing the oligomeric size of LH1 subunits in the LH1 complex 
of Rba. sphaeroides from 2 to 6 or 7 is accompanied by redshifts of 
6-7 nm in absorption and fluorescence emission of the BChl a-Q 
band, although larger oligomers produced no further redshifts*”. 

The fourth factor is the structures of BChl a or BChI b aggregates. 
The Mg-Mg distances within BChl pairs reflect the degree of overlap, 
and therefore the electronic coupling and Qy redshifting, of BChl a 
and BChl b in light-harvesting complexes. Extended Data Fig. 5 shows 
the linear correlation of Qy-band maximum versus inter- and intra- 
subunit Mg—Mg distances in five different light-harvesting complexes, 
which shows a stronger correlation for the intra-subunit distances. The 
intra-subunit (8.8 A) or inter-subunit (8.5 A) Mg-Mg distances of BChl 
bin Blc. viridis are the shortest reported for a bacterial light-harvesting 
complex. 

Finally, the shift is affected by the structural rigidity enforced 
by the 1-apoproteins. Sixteen )\-apoproteins pack tightly between 
8-polypeptides, and collectively interlock the LH1 structure through 
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Fig. 5 | A quinone-quinol channel in the LH1-RC core complex. 

a, The LH1-RC complex viewed from the periplasmic side, with 80% 
transparency applied to the RC, and LH1 subunits 9-17. A green arrow 
indicates the gap between subunits 1 and 17. b, LH1-RC rotated 90° from 
a, with Qz and Qp viewed by removing LH1 subunits 1-8. c, Close-up of 


32 hydrogen bonds to a- and 8-polypeptides, constraining free move- 
ment of the LH1 ring and stabilizing the BChl b pairs in the complex 
and thereby contributing to the redshift of the BChl b-Q, band”. 
There are parallels with the large redshift of BChl a to 915 nm in 
the LH1-RC complex from Tch. tepidum (Extended Data Fig. 6a). 
In this case, bound Ca?* ions constrain conformational flexibility*4 
and limit disorder in site energies. Inhomogeneous narrowing is 
accompanied by mixing of charge transfer and lowest exciton states, 
which has been hypothesized to be the basis for the redshift in this 
complex’. 


A template for preparing quinols for export 

The LH1-RC of Blc. viridis houses the RC quinones Qa and Qs, and 
a third quinone, Qp (Fig. 5a—c). The binding sites of Q4 and Qg are 
similar to those reported previously, although their tail structures differ 
from those in isolated RCs?! *5 (Extended Data Fig. 6b, c). Qp is located 
near the gap in the LH1 ring, 48.9 A away from the Qg-binding site. The 
head of the Qp molecule is stabilized by 1-1-interactions with RC-L- 
Phe40, the aromatic ring of which is roughly parallel to the plane of the 
quinone-head group at a distance of 3.6 A. Qp is also in close proximity 
(3.0 A) to LH1-al-Tyr27, the aromatic ring of which is roughly per- 
pendicular to the plane of the Qp-head ring. Unlike Q, and Qs in the 
RC, the tail of Qp is not free to move, and is instead conformationally 
constrained by a series of contacts with LH1-a(1)-Phe37 (4.7 A), RC-L- 
Gln87 (2.4.A), RC-L-Trp142 (3.5 A) and RC-L-Val91(4.4.A) (Extended 
Data Fig. 6d). This Qp-binding pocket provides a folding template such 
that Qp assumes a compact conformation and a suitable orientation 
before entering the pore in LH1 at the position of the absent 17th 
‘\-apoprotein (Fig. 5c, d). 
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the Qp-binding pocket. d, Ribbon representation of the Qp region. Green 
arrow as in a. e, Close-up view of the quinone-quinol channel (dashed 
circle) from outside the LH1 ring. f, Electron densities of pigments 
adjacent to the LH1 pore. 


The cryo-EM structure of the LH1-RC complex reveals the mech- 
anism by which quinone is translocated across the LH1 ring. Of the 
17 subunits, 16 are a-8-74-heterotrimers, and only one is an a-(- 
heterodimer. The 16 )-polypeptides, located outside the B-ring, pack 
between ($-apoproteins, leaving a gap in the LH1 ring between subu- 
nits 1 and 17, and dictate the position of the pore for quinone-quinol 
translocation. The Qp-binding pocket is located next to the pore, and 
the Qp molecule appears to be folded and oriented in the binding 
pocket in a manner that encourages passage through the LH1 ring 
(Fig. 5d). There is a distinct pore measuring around 5 x 7 A between 
a17 and al (Fig. 5e), which is created by Arg18-Phe25 (sequence, 
RRVLTALF) in a17 and Leul5—Leu24 (LDPRRVLTAL) in al 
(Fig. 5e). It should be noted that the electron densities of B(17)-BChl 
b, a(1)-BChl b and the «(1)-6(1) carotenoid are weaker than those of 
their counterparts in the rest of the LH1 complex. This is particularly 
evident for those regions of the pigments that are close to the quinone 
pore, for example the phytyl tails and one end of the carotenoid, as 
shown in Fig. 5f. This weaker density reflects the relative flexibility of 
this region; thus the size of this pore could fluctuate transiently, faci- 
litating the movement of the quinone and quinol molecules through 
the channel. 
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Any Methods, including any statements of data availability and Nature Research 
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METHODS 


Protein purification. Wild-type Blc. viridis (DSM-133) was obtained from 
DSMZ. Photosynthetic cultures of Blc. viridis were grown in sodium succinate 
medium 27 (N medium) under illumination (100j:mol photons per m? s~?) at 
30°C in 20-1 screw-capped vessels, completely filled with N2-sparged medium, 
as previously described**. Cells were collected when the culture reached an optical 
density of 1.6 at 680 nm by centrifugation at 3,290g for 30 min. Washed cells were 
broken by passage through a French press three times at 18,000 p.s.i. The crude 
cell lysate was applied to a two-step sucrose gradient (15% and 40% (w/w) in an 
ultracentrifugation tube). Photosynthetic membrane was collected at the interface 
of 15% and 40% sucrose after 5h centrifugation at 100,000g. Membranes were 
pelleted and resuspended in working buffer (20 mM HEPES, pH 7.8). The optical 
density of the membrane was adjusted to ~100 at 1,015 nm. For solubilization of 
the core complexes, the optical density at 1,015 nm of the photosynthetic mem- 
brane was adjusted to 60, and 3% (w/w) n-dodecyl 3-p-maltoside was added. This 
mixture was then stirred in the dark at 4°C for 30 min. Unsolubilized material 
was removed by centrifugation for 1h at 211,000g. The clarified supernatant was 
loaded onto an ion exchange column pre-equilibrated with working buffer solu- 
tion containing 0.03% n-dodecyl 8-p-maltoside. The core complexes eluted at 
~250mM NaCl and were collected and concentrated. These were further purified 
using a Superdex 200 gel filtration column. The fractions with an absorption ratio 
of Aj,008 nm/A280 nm higher than 1.22 were pooled together and used for cryo-EM 
data collection. 

Cryo-EM data collection. The protein concentration was adjusted to an optical 
density of 40 at 1,008 nm. Three microlitres of protein solution was applied to 
a glow-discharged holey carbon grid (Quantifoil grid R1.2/1.3, 300 mesh Cu). 
The grid was plunged into liquid ethane cooled by liquid nitrogen using a Leica 
EM GP vitrobot. Parameters were set as follows: blotting time 4s, humidity 99%, 
sample chamber temperature 5°C. The frozen grid was stored in liquid nitro- 
gen before use. A second grid was prepared using a Quantifoil grid R3.5/1.0 
covered by a thin carbon film (EM resolution, Inc.), with the protein diluted 
tenfold. Vitrification conditions were the same as for the first grid. Data were 
recorded at eBIC on a Titan Krios electron microscope with a Gatan 968 GIF 
Quantum with a K2 summit detector operating at 300 kV accelerating voltage, 
at nominal magnification of 130k in counting mode. Movies were collected in 
super-resolution mode and Fourier-cropped to give a resulting calibrated pixel 
size of 1.06 A at the specimen level. An energy-selecting slit of 20eV was used. 
An exposure rate of 5 electrons per pixel per s was set and a fresh super-reso- 
lution gain reference was performed at this dose rate before data acquisition. 
A total dose of 45 electrons per A? was used for movies of 20 frames. In total, 
6,472 movies were collected with defocus values from 1.0 to 3.0,1m. Two typical 
cryo-EM images, which are averaged from motion-corrected movie frames, are 
shown in Extended Data Fig. 7a, b. 

Data processing. All images that were empty, contained few particles, or were 
contaminated with ice were discarded. Dose-fractionated images were subjected 
to beam-induced motion correction using MotionCorr*”. Images derived from the 
sum of all frames were used for further data processing using RELION 2.0°*“°. 
CTF parameters were determined using gctf"l. In total, 267,726 particles were 
picked manually. These particles were subjected to reference-free 2D classification. 
Those particles that categorized into poorly defined classes were rejected. This 
cleaning procedure using 2D classification was repeated three times, resulting in 
rejection of 9.45% of total particles. The resulting 2D classes were subjected to an 
initial 3D model calculation using EMAN2” for maximum-likelihood-based 3D 
classification. One of the four stable 3D classes accounting for 62.3% total parti- 
cles was selected for high resolution refinement and 3D reconstruction without 
subtraction of detergent micelle from the raw micrographs. This resulted in a map 
at a global resolution of 3.3 A. The density map was corrected for the modulation 
transfer function (MTF) of the Gatan K2 summit camera and further sharpened 
using the post-processing subroutine in RELION 2.0 using an estimated temper- 
ature factor and a mask was created using RELION 2.0 with a lowpass of 15 A and 
a soft edge of 7 A. The Fourier shell correlation (FSC) curve corrected for masking 
is shown in Extended Data Fig. 7c. The estimate of final resolution of 2.9 A for the 
LH1-RC map was based on a FSC cut off of 0.143. ResMap* was used for calcu- 
lation of the local resolution map (Extended Data Fig. 4b, c). 
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Modelling and refinement. Initially, the crystal structure of the Blc. viridis RC 
(PDB: 1PRC) was fitted to the cryo-EM map as a rigid body using the ‘fit in map’ 
routine in Chimera*!. COOT* was then used for manual adjustment and real- 
space refinement for both polypeptides and cofactors. All amino acid sequences of 
polypeptides in the RC are listed in Extended Data Fig. 8. Ubiquinone-9 molecules 
(Qz and Qp) were also fitted to the density map independently using COOT. 

For LH1, the electron density of LH1 subunit 3 was selected for modelling 

first. On the basis of structural similarity compared with the LH1 of Tch. tepidum* 
and LH2 of Rhodospirillum molischianum", the locations of His residues, which 
ligate BChl b molecules in the a-$-polypeptides (Extended Data Fig. 9), were 
located in the density map. The fitted RC was used as a reference to determine 
the orientation of the a-6-polypeptides. Their amino acid sequences, taken from 
previous work”, were fitted into an electron-density map using COOT. Two BChl 
b molecules and one all-trans carotenoid were added into the model based on 
their densities. Analysis of pigment composition shows that the major carotenoid 
in the core complex is all-trans-1,2-dihydroneurosporene’; this carotenoid was 
therefore modelled into the density map. Having no His residues, the \-polypeptide 
does not bind BChlI b molecules. No 3D structural information of the y-subunit 
was available, but the 2.9 A resolution allows assignment of the larger amino acid 
side chains such as Trp and Tyr. By matching three Trps and one Tyr residue in 
the -polypeptide, its orientation was determined, and all other residues were 
traced based on the density map using COOT. Comparison with the sequence of 
the \-polypeptide*” leaves 12 N-terminal residues unaccounted for. The structure 
of the LH1 a-$-7-subunit was then used as a rigid body to fit into the density 
map for other LH1 subunits. For the LH1 subunit 17, only a-6 and pigments 
were used. All of the LH1 subunits then underwent real-space refinement using 
COOT. The final model was subjected to global refinement and minimization 
using REFMACS5*, The final refinement statistics are summarized in Extended 
Data Table 1. The quality of fit for the structural model within the electron-density 
map was validated using EMRinger™®. 
Data availability. The cryo-EM density map has been deposited in the World 
Wide Protein Data Bank (wwPDB) under accession code EMD-3951 and the 
coordinates have been deposited in the Protein Data Bank (PDB) under accession 
number 6ET5. 
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Extended Data Fig. 1 | Absorption spectra of photosynthetic 
membranes and the purified LH1-RC core complex from Blc. viridis. 
Absorption spectra of isolated membranes (dashed line) and the purified 
LH1-RC complex (solid line) were recorded at room temperature and 
normalized at their Q, bands at 1,015 nm and 1,008 nm. The peak at 

831 nm together with a shoulder at ~970 nm arise from BChl b in the RC. 
Bacteriopheophytin appears as a poorly resolved peak at about 810 nm. 


The Q, bands give rise to a composite peak at 602 nm. The minor peak 

at about 558 nm arises from the cytochromes, the Soret band of which 
contributes in the approximately 410-nm region. Absorption features at 
482, 450 and 420 nm correspond to carotenoids and the 399-nm maximum 
corresponds to the Soret band of BChl b in the core complex. No oxidized 


BChl b is observed which, if present, would cause an absorption peak at 
about 685 nm. 
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Extended Data Fig. 2 | Residue-residue distance deviation flexible RC-H loop. b-e, Residue-residue (RR) distance deviation maps** 
between cryo-EM and X-ray structures of the RC from Ble. viridis. of the individual RC subunits C, M, L and H, respectively, comparing the 
a, Superposition of the X-ray structure (PDB: 1PRC, grey) and the structures from cryo-EM and X-ray crystallography (PDB: 1PRC)*!. Each 
cryo-EM structure (colour-coded as in Fig. 1) of the RC. A putative hinge vertical scale shows the standard deviation (s.d.) in A. The flexible loop of 
point is indicated with a red dot. The bending direction of the cryo-EM RC-H is indicated with a red perpendicular arrow ine. 


structure is indicated with two green arrows. A red arrow points to a 
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Extended Data Fig. 3 | Cryo-EM densities and structural models of polypeptides and pigments in the Blc. viridis LH1-RC complex. The colour code 
is the same as in Fig. 1. The contour levels of the density maps were adjusted to mirror their molecular weights. 
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Extended Data Fig. 4 | Electron densities between and outside the LH1 other disordered molecules are in grey. b, Side view of the core complex 
and RC complexes, and local resolution maps of the LH1-RC core with the periplasmic side uppermost. c, View of the periplasmic side. All 
complex. a, The LH1-RC complex as shown in Fig. 1f, but displayed at membrane-extrinsic parts of the complex were truncated for clarity. The 
70% transparency. Electron densities belonging to detergent, lipid and coloured bar chart on the right shows the local structural resolution in A. 
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Complex Q, band (nm) Mg-Mg (A) 
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B800-820 LH2 Rps. acidophila 824 9.51 8.97 
B800-850 LH2 Rps. acidophila 858 9.45 9.00 
B800-850 LH2 Phs. molischianum 846 9.36 8.95 
RC-LH1 Tch. tepidum 915 8.97+0.07 8.56+0.07 
RC-LH1 Ble. viridis 1008 8.8+0.1 8.5+0.1 


*The asterisk indicates that Mg-Mg distances were calculated to two decimal places for structures 


obtained using X-ray crystallography. For the Tch. 


tepidum and Bic. viridis structures the Mg-Mg 


distances differ round the LH1 ring, so standard errors were calculated. 


Extended Data Fig. 5 | Relationship between BChl a and BChl b 
Mg-Mg distances and Q,-band absorption in bacterial light harvesting 
complexes. a, Correlation of Q,-band maximum and inter-subunit 

BChl a and BChl b Mg-Mg distances in five bacterial light-harvesting 


complexes. b, As in a, but for intra-subunit Mg—Mg distances. c, Values for 
the linear correlation coefficient R, calculated using least-squares linear 
regression (n =5 biologically independent samples in each case; one-sided 
significance test). 
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Extended Data Fig. 6 | Structural comparisons of selected cofactors 
and details of the Qp binding site. a, The LH1-B1008 BChl b pair from 
Bic. viridis (blue) compared with the LH1-B915 BChl a pair (green) from 
the X-ray structure of the Tch. tepidum LH1-RC complex (PDB: 3WMM). 
b, Comparison of the Qa, menaquinone-9 (blue) from the cryo-EM model 


1. LH1a1-Tyr 27 
2. LH1a1-Phe 37 
3. RC-L-Phe 40 
4. RC-L-Glu 87 
5. RC-L-Val 91 

6. RC-L-Trp 142 


LHial 


of the Blc. viridis LH1-RC with the Qa (green) from the X-ray structure of 
the Blc. viridis RC (PDB: 3T6E). c, As in b, but comparing Qg. d, The Qp 
binding site. Only LH1-a1 and part of RC-L are shown for clarity. Yellow, 
LH1-a1; orange, RC-L; blue, Qp; wheat, Qg. Amino acid residues making 
close contacts around Qp are numbered and listed accordingly. 
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Extended Data Fig. 7 | Cryo-EM micrographs of the LH1-RC complex They generated very similar 3D maps for the LH1-RC complex, so 


from Blc. viridis and calculation of the cryo-EM map resolution. they were combined. b, The LH1-RC particles are covered by a thin 

a, Protein particles embedded in vitrified ice. Examples of LH1-RC layer of vitrified ice on a supported carbon film. Each image measures 
complexes are circled. 6,472 cryo-EM movies were recorded, from which 393.2 x 406.8 nm. c, Gold-standard refinement was used for estimation 
267,726 particles were picked manually for reference-free two-dimensional _of the final map resolution. The global resolution of 2.9 A was calculated 
classification. During data processing, datasets of around 100,000 and using an FSC cut off of 0.143. 


around 167,000 particles were used independently for 3D reconstruction. 
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LH1a: 
MATEYRTASWKLWLILDPRRVLTALFVYLTVIALLIHFGLLSTDRLNWWEFOR GLPKAASLVVVPPAVG 
1 11 21 31 41 51 61 
LH1 B: 
MADLKPSLTGLTEEEAKEFHGIFVTSTVLYLATAVIVHYLVWTARPWIAPI PK GWVNLEGVOSALSYLV 
1 11 21 31 41 51 61 
LH1 y: 
MKLSAILGALSVVLTSTIASAYFAADGSVVPSISDWNLWVPLGILGIPTIWIALTYR 
1 11 21 31 
RC-M: 


MADYOTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIAAFAFGSTAILI ILFNMAAEVHFDPLOFFROFFWLGLY PPKAQYGMGI PPLHDGGWWLMAGLFM 
TLSLGSWWIRVYSRARALGLGTH IAWNFAAAI FFVLCIGCIHPTLVGSWSEGV PFGIWPHIDWLTAFSIRYGNFYYCPWHGFS IGFAYGCGLLFAAHGATILAVARFGGDREIEQITDRGT 
AVERAALFWRWTIGFNAT IESVHRWGWFFSLMVMVSASVGI LLTGTFVDNWY LWCVKHGAAPDY PAY LPATPDPASLPGAPK 


RC-H: 
MYHGALAQHLDIAQLVWYAQWLVIWTVVLLYLRREDRREGY PLVEPLGLVKLAPEDGOQVYELPY PKTFVLPHGGTVTVPRRRPETRELKLAQTDGFEGAPLOPTGNPLVDAVGPASYAERA 
EVVDATVDGKAKIVPLRVATDFS IAEGDVDPRGLPVVAADGVEAGTVTDLWVDRSEHYFR Y LELSVAGSARTALI PLGFCDVKKDKIVVTSILSEQFANVPRLOSRDOITLREEDKVSAYY 
AGGLLYATPERAESLL 


RC-L: 
MALLSFERKYRVRGGTLIGGDLFDFWVGPYFVGFFGVSAIFFIFLGVSLIGYAASQGPTWDPFAISINPPDLKYGLGAAPLLEGGFWQAITVCALGAFISWMLREVEISRKLGIGWHVPLA 
FCVPIFMFCVLQVFRPLLLGSWGHAFPYGI LSHLDWVNNFGYQY LNWHYN PGHMSSVSFLFVNAMALGLHGGLILSVANPGDGDKVKTAEHENQY FRDVVGYSIGALSIHRLGLFLASNIF 
LTGAFGTIASGPFWTRGWPEWWGWWLDIPFWS 


RC-CytcC: 
MKQLIVNSVATVALASLVAGCFEPPPATTTOQTGFRGLSMGEVLH PATVKAKKERDAOY P PALAAVKAEGPPVSOVYKNVKVLGNLTEAEFLRTMTAITEWVS PQOEGCTYCHDENNLASEAK 


YPYVVARRMLEMTRAINTNWTQHVAQTGVTCYTCHRGTPLPPYVRYLEPTLPLNNRETPTHVERVETRSGYVVRLAKY TAY SALNY DPF TMFLANDKROVRVVPOQTALPLVGVSRGKERRP 
LSDAYATFALMMS I SDSLGTNCTFCHNAQTFESW GKKSTPQRAIAWWGI RMVRDLNMNY LAPLNASLPASRLGRQGEAPOADCRTCHOGVTKPLFGASRLKDY PELGPIKA AAK 


Extended Data Fig. 8 | Amino acid sequence of polypeptides in the LH1-RC complex from Ble. viridis. Black, genome sequence; red, protein 
sequence; blue, missing in protein sequence. 
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a-polypeptides 


Ble. viridis (P04123) MATEYRTASWKLWLILDPRRVLTALFVYLTVIALLI HFGLLSTDRLNWWEFORGLPKAAS LVV--VPPAVG-— 
Rps. rubrum (Q2RQ24) 2 === === MWRIWOLFDPROALVGLATFLFVLALLI HFILLSTERFNWLEGASTKPVQTSMVMPSS--—DLAV 
Rba. sphaeroides (Q3J1A4) = MSKFYKIWMIFDPRRVFVAQGVFLFLLAVMI HLILLSTPSYNWLEISAAKYNRVAVAE-— a 
Rba. capsulatus (P02948) = MSKFYKIWLVFDPRRVFVAQGVFLFLLAVLI HLILLSTPAFNWLTVATAKHGYVAAAQ— 
Phs. molischianum (Q9R4K5) MWKIWTLYDPRRTLSGLFTFLTVLGLLI HFLLLSTDRFNWLDGAREAHNV -—— 
Phs. molischianum LH2 (P97253) --MSNPKDDYKIWLVINPSTWLPVIWIVATVVAIAV HAAVLAAPGFNWIALGAAKSAAK-— 
Rps. palustris (Q6N9L4) = --MWRIWLLFDPRRALVLLFVFLFGLAIII HFILLSTSRFNWLDGPRAAKAASI S-LPFTPPSMPV 
Tch. tepidum (D2Z0P2) -MFTMNANLYKIWLILDPRRVLVSIVAFQIVLGLLI HMIVLSTD-LNWLDDNI PVSYQALGKK---~------ 


s* oss: ** 


cose St Se : et 


GB -polypeptides 


Ble. viridis (P04124) ----MADLKPSLTGLTEEEAKEFHGIFVTSTVLYLATAVIV HYLVWTARPWIAPIPKGWVNLEGVQSALS-—---YLV 
Rps. rubrum (P02950) ---MADKNDLSFTGLTDEQAQELHAVYMSGLSAFIAVAVLA HLAVMIWRPWE -------=------------------ 
Rba. sphaeroides (Q2RQ23) ---MAEVKQESLSGITEGEAKEFHKIFTSSILVFFGVAAFA HLLVWIWRPWV-PGPNGYSALETLTOTLT----YLS 


Rba. capsulatus (P95673) = 


—---MAERSLSGLTEEEAIAVHDQFKTTFSAFIILAAV AHVLVWVWKPWF 


Phs. molischianum (Q3J1A3) ---MADKSDLGYTGLTDEQAQELHSVYMSGLWLFSAVAIVAHLAVYIWRPWF: 
Phs. molischianum LH2 (D2Z0P1)- -MAEQKSLTGLTDDEAKEFHAI FMQSMYAWFGLVVIAHLLAWLYRPWL. 
Rps. palustris (Q6N9L5) = 8 ------' MSDGS ISGLSEAEAKEFHSIFVTSFFLFIVVAVVA HILAWMWRPWL-PKATGYAMDSVHOLTSF--LC--— 
Tch. tepidum (032409) MANSFVRGGGTLSGLSESEAQEFHGIFVTSFISFIVVAIDA HFLAWKWRPWL- PGVKGYALLDNASTAAQSVLSTLV 


Ses A ae OS : * Ses 


Extended Data Fig. 9 | Amino acid sequence alignment of LH1 a- and ligates BChls in the LH1 complexes. The «- and 8-polypeptides of the 
8-polypeptides in LH1-RC core complexes from purple photosynthetic — P. molischianum LH2 complex are included for comparison. The sequence 
bacteria. All sequences have been aligned relative to the His residue that alignment was performed using CLUSTAL O v.1.2.4. 
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Extended Data Table 1 | Cryo-EM data collection, refinement and validation statistics 


Parameter 
Data collection and processing 


(EMDB-3951, PDB 6ET5) 


Magnification 130,000 
Voltage (kV) 300 
Electron exposure (e-/A’) 2.25 (45 e- on 20 frames) 
Defocus range (um) -1.0 to -3.0 
Pixel size (A) 1.06 
Symmetry imposed cl 

Initial particle images (no.) 267,762 
Final particle images (no.) 166,816 
Map resolution (A) (global) 2.9 

FSC threshold 0.143 

Map resolution range (A) ~2.5-3.5 
Refinement 

Initial model used (PDB) 6ETS 
Model resolution (A) 2.9 

FSC threshold 0.143 
Model resolution range (A) ~2.5-3.5 


Map sharpening B factor (A’) 
Model composition 


Estimated automatically using RELION 2.0* 


Non-hydrogen atoms 31994 
Protein residues 3492 
Ligands 75 
B factors (A’) 
Protein RELION auto-estimated 
Ligand RELION auto-estimated 


R.m.s. deviations (Refmac5) 


Bond lengths (A) 0.01 

Bond angles (°) 3.21 
Validation 

wwwPDB 

Clashscore 27 

Poor rotamers (%) 5.5 

Ramachandran plot (COOT) 

Favored (%) 86.01 

Allowed (%) 9.67 

Disallowed (%) 4.32 

Refmacs* 

FSC 0.89 (0.62) 

R factor 0.26 (0.40) 

Angle (rms) 3.21 (3.45) 

Bond (rms) 0.01 (0.02) 

Chiral (rms) 0.23 (0.31) 

EMRinger score 3.34 


ARTICLE 


*Data taken from ref. 5°. 
+These results are calculated from a density map, in which electron density contributed by the surrounding belt of detergent was removed by masking. The results from the unmasked model are 
presented in parentheses. 
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Structure of photosynthetic LH1-RC 
supercomplex at 1.9 A resolution 


Long-Jiang Yu', Michihiro Suga!, Zheng-Yu Wang-Otomo? & Jian-Ren Shen!* 


Light- harvesting complex 1 (LH1) and the reaction centre (RC) form a membrane-protein supercomplex that performs 
the primary reactions of photosynthesis in purple photosynthetic bacteria. The structure of the LH1-RC complex can 
provide information on the arrangement of protein subunits and cofactors; however, so far it has been resolved only 
at a relatively low resolution. Here we report the crystal structure of the calcium-ion-bound LH1-RC supercomplex of 
Thermochromatium tepidum at a resolution of 1.9 A. This atomic-resolution structure revealed several new features about 
the organization of protein subunits and cofactors. We describe the loop regions of RC in their intact states, the interaction 
of these loop regions with the LH1 subunits, the exchange route for the bound quinone Qs with free quinone molecules, 
the transport of free quinones between the inside and outside of the LH1 ring structure, and the detailed calcium -ion- 
binding environment. This structure provides a solid basis for the detailed examination of the light reactions that occur 


during bacterial photosynthesis. 


Photosynthesis converts light energy from the Sun into biologically 
useful chemical energy, thereby sustaining virtually all life forms on 
Earth. The photosynthetic apparatus of purple photosynthetic bacteria 
is simple and robust, and has been studied extensively'. In most such 
bacteria, there are two types of light-harvesting complex, LH1 and LH2. 
Light energy is first absorbed by the peripheral LH2, then transferred 
via LH1 rapidly and efficiently to the reaction centre (RC) to drive 
the primary photochemical reactions. LH1 exists in all purple bacteria 
and surrounds the RC to form an integral membrane protein-pigment 
supercomplex (LH1-RC), consisting of 32-36 subunits with a total 
molecular weight of approximately 400 kDa. 

The structures of Ca**-bound and Sr?*/Ba**-substituted LH1-RC 
supercomplexes have been determined at resolutions of 3.0 A and 
3.3 A, respectively, from the thermophilic purple sulfur bacterium 
Thermochromatium tepidum**. These structures showed that the RC 
is surrounded by 16 heterodimers of the LH1 a3-subunits contain- 
ing 32 bacteriochlorophyll (BChl) a and 16 spirilloxanthin molecules, 
forming a completely closed elliptical ring. This is different from the 
structures of both LH1-RC dimers from Rhodobacter sphaeroides* and 
LH1-RC monomers from Rhodopseudomonas palustris’, determined 
by X-ray diffraction at resolutions of 8.0 A and 4.8 A, respectively. Both 
show incomplete ring structures, with the dimer having a PufX subunit 
and the monomer having the transmembrane helix protein W in the 
ring opening. Furthermore, 16 Ca”+-binding sites were observed in 
the C-terminal-loop region of the thermophilic LH1 complex, which 
was considered to be the reason for the unusual redshift and enhanced 
thermal stability of the thermophilic LH1°”. 

The resolution of the LH1-RC structure reported so far, however, has 
not been sufficient to reveal the detailed organization of many of the 
cofactors involved in the energy and electron transfer reactions within 
this supercomplex. Here we report the structure of LH1-RC from Tech. 
tepidum at 1.9 A resolution, which reveals the detailed arrangement 
and organization of a large number of cofactors. On the basis of this 
high-resolution structure, we have examined the energy transfer from 
LH1 to RC, the quinone and proton channels, and the possible roles 
of Ca**. These results greatly advance our understanding of bacterial 
photosynthetic light reactions. 


Overall structure 

Compared with those from the previous structural determination’, the 
quality of the LH1-RC crystals was improved considerably here by 
optimization of the detergent and other conditions of crystallization, 
enabling the structure to be resolved at 1.9 A resolution (Extended Data 
Figs. 1 and 2, Extended Data Table 1). The space group of the crys- 
tal obtained, C2, was the same as that reported in the previous study” 
(Extended Data Fig. 1c, d). However, the unit-cell dimensions were 
much shorter than before (Extended Data Table 1), leading to a more 
compact packing and a much lower solvent content of 55% (compared 
to 65% for the previous crystals”). This could be a major reason for the 
substantial improvement in the crystal resolution. 

The overall structure of the LH1-RC complex is similar to that previ- 
ously determined at 3.0 A resolution? (Fig. 1). However, the root mean 
square deviation (r.m.s.d.) between Ca atoms of the two structures 
is 1.68 A; this relatively large value mainly arises from deviations in 
some regions of the RC subunits and in the N and C termini of the 
LH1 subunits. We identified a number of lipid and detergent mole- 
cules in the ‘gap region’ between RC and LHI (Fig. Ic), resulting in a 
rather crowded gap region in comparison with that from the previous 
structure, which was relatively empty (Extended Data Fig. 3). The RC 
contains four protein subunits, four BChls a, two bacteriopheophytins, 
one Mg?* ion, one Fe** ion, one spirilloxanthin molecule, one mol- 
ecule of menaquinone-8 (MQ8) and five molecules of ubiquinone-8 
(UQ8). The LH1 contained 16 pairs of LH1 o6-subunits, 32 BChls a, 
16 spirilloxanthin molecules and 16 Ca** ions (Fig. 1, Extended Data 
Table 2). In addition, nearly 1,000 water molecules were found in the 
supercomplex, mostly distributed in the hydrophilic surfaces of the 
cytoplasm and periplasm (Fig. 1b). LH1-RC is elliptical in shape, as 
has been reported previously’, but the distances between the BChls of 
the RC and the closest LH1 BChls are almost equal across the whole 
structure (Fig. 1d). This could ensure an efficient energy transfer from 
LH1 to RC. 

The r.m.s.d. values between Ca atoms of the new RC structure and 
the RC-only structure at 2.2 A resolution’, or the RC structure at 3.0A 
resolution’, were 3.19 A and 1.98 A, respectively, due to apparent differ- 
ences in some regions of the RC structures (Extended Data Figs. 3b, 4). 
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Fig. 1 | Architecture of LH1-RC from Tch. tepidum at a resolution 

of 1.9A.a, View from the direction parallel to the membrane plane. 

LH1 a-subunit, blue; LH1 3-subunit, light cyan; C subunit, purple. 

b, Arrangement of the cofactors and water molecules, with the same view 
as in a. c, Arrangement of the cofactors, viewing perpendicular to the 


The C-terminal loop regions of all LH1 a-apoproteins, which have 
poor electron densities in the previous 3.0 A structure’, show relatively 
large deviations (Extended Data Fig. 4c, d). Because these terminal 
regions are exposed to the surface of the membrane and partly flexible, 
as shown by their high B-factors (Extended Data Table 3), we adjusted 
the lengths of the a- and 6-apoproteins with confidence based on the 
electron density map. Furthermore, the positions and coordination 
patterns of the calcium ions were identified unambiguously. Next, we 
describe features found in the present high-resolution structure that are 
unique and important for the function of this supercomplex. 


Unique features in the structure of RC 

Three major differences were found between the present intact RC 
and the previous isolated RC structures. These include the cytochrome 
(Cyt) subunit N-terminal region and its loop region (residues 172- 
196), and a loop region in the H subunit (residues 44-58) (Fig. 2a, c, d). 
The Cyt subunit is located at the periplasmic side, and has been 
reported to be a lipoprotein in Blastochloris viridis with its N-terminal 
cysteine linked to a diglyceride via a thioether bond?"". In the pre- 
vious structure’, this region was assigned incorrectly because of the 
lower resolution, or possibly because of damage by X-ray radiation 
that could break the thioether bond!!. In the present structure, the 
electron-density map showed three partial acyl chains attached to 
Cys23 of the Cyt subunit. Among these, one is a single chain and 
another one is clearly branched (Fig. 2b). This suggests that the 
N-terminal cysteine is triacylated with N-acyl and S-diacylglycerol 
in a manner similar to that of an outer membrane protein”, and 
these fatty acids anchor the subunit in the membrane. Beyond the 
seventh carbon from the carbonyl carbon, all aliphatic tails are 
disordered—presumably due to their flexibility—and could interact 
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membrane. Protein subunits are depicted in light grey. d, Distances of the 
closest BChl pairs between RC and the surrounding LH1 ring. Colour 
codes for cofactors: BChls, green; spirilloxanthin, yellow; Ca?* ions, 
orange spheres; water molecules, pale pink dots. 


with UQ8, which was found in a location appropriate for its forth- 
coming exchange with Qs. In addition, the loop region (172-196) of 
Cyt showed a large deviation from that of the isolated RCs*’, and 
appears to interact with the neighbouring LH1 only in this confor- 
mation (Fig. 2c). A Mg* ion was found in its vicinity (Fig. 2c, e), 
which may reduce the flexibility of this long loop. 

The loop region (44-58) of the H subunit was traced unambiguously 
in the present structure (Fig. 2d), and interacts with the neighbouring 
LH1 (-polypeptide at the cytoplasmic side. This is largely different from 
the isolated RCs and the previous LH1-RC complex. In the recently 
reported structure of the RC from Ble. viridis? at 1.92 A resolution, these 
residues are located in the crystal lattice contact region, which is dif- 
ferent from their location in previous structures!!'*. The interactions 
with the LH1 polypeptides therefore seem to be required to maintain 
this region in the correct configuration. 


Quinones and lipids 

One molecule of MQ8 and five molecules of UQ8 were found in the 
LH1-RC supercomplex (Fig. 3a), consistent with the results of bio- 
chemical analysis of the same sample!*. Among these quinones, MQ8 
and one of the UQ8 molecules function as the primary (Qa) and sec- 
ondary (Qg) quinone acceptors, respectively, with similar binding sites 
to those reported previously. The additional UQ8 molecules were 
found to be distributed over the RC and in the gap region between 
the RC and LH1. In particular, one of the UQ8 molecules was located 
in a position close to the isoprenoid tail of Qz near to the periplasmic 
surface (Fig. 3a), with its head oriented in the same direction as that 
of Qg, suggesting that this quinone is in a position appropriate for the 
exchange of Qg after its double reduction and protonation. This is sup- 
ported by the fact that, whereas the head of Qs is hydrogen bonded to 
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Teh. tépidums 
LH1-RC 


Fig. 2 | Differences between the isolated and intact RC core structures. 
a, Superposition of the RC structures from Tch. tepidum (1EYS; purple), 
Rba. sphaeroides (2J8C; orange), Blc. viridis (3T6E; yellow) and the present 
structure (blue). Regions similar in structure are coloured in grey, whereas 
the three regions with large differences are boxed and coloured differently. 
The boxed areas are enlarged in b, c and d, as indicated. b, Triacylation of 
the Cyt N-terminal cysteine. c, The loop region of Cyt (residues 172-196) 
including the Mg?*-binding site, which is circled here and enlarged in e. 
d, The N-terminal region and residues 44-58 of the H subunit. e, The 
Mg?*-binding site in the Cyt subunit. 


the L-subunit residue His199, L-Ser232, L-Ie233 and L-Gly234, the 
head of the second UQ8 was not hydrogen bonded to any residues, 
and its isoprenoid tail was not visible, presumably owing to its high 
flexibility (Fig. 3a). The cavity containing the head of this UQ8 was 
formed by L-Met183, L-Leu184, L-Ser187, L-Trp272, M-Ile179, as well 
as accessory BChl bound to the M subunit. 

Three UQ8s are located in the gap region between LH1 and RC. 
Among these, two are close to the LH1 ring, and the isoprenoid tail of 
one of these two UQ8s was found to be inserted into a space between 
the LH1 a- and 68-subunits (Fig. 3b, c), which suggests that it is under- 
going transport between the inside and outside of the ring through the 
possible exchange channel between the LH1 subunits. This channel 
is close to the cytoplasmic side of the membrane at the same level as 
the head of Qs, and has been suggested in previous studies”"», but this 
present result provides direct evidence for the transport of quinones 
through such channels. This channel is hydrophobic and is surrounded 
by Val20, Ser23, Ile24 and Phe27 from an a-subunit of LH1 on one 
side, and Leu21, Val22, Val25 and Ile29 from another a-subunit on 
the opposite side (Fig. 3c). In addition, the spirilloxanthin and the 
phytol tail of BChl bound to the LH1 «-subunit are near to the exit of 
the channel, and may contribute to the formation of the hydrophobic 
channel exit. 


ARTICLE 


Fig. 3 | Distribution of quinones and lipids in LH1-RC. a, Overall 
distribution of six quinones in the present structure. UQs, green; MQ, 
cyan; UQ (with its tail inserted in the channel between the LH1 a- and 
8-subunits), red. The two «- and two B-subunits of LH1 that form the 
channel for transport of the UQ are shown in orange and blue, respectively. 
b, c, The quinone-exchange channel between the LH1 a- and 8-subunits. 
d, Distribution of lipids from a side view of LH1-RC. The lipid molecules 
are shown in space-filling mode. Oxygen, red; nitrogen, blue; PEF carbons, 
green; CDL carbons, yellow; phosphatidylglycerol carbons, magenta; 
proteins, grey. 


Extensive hydrogen-bonding networks were found in the H sub- 
unit that connects Qs to the cytoplasmic surface (Extended Data 
Fig. 5), owing to the large number of water molecules. One of the 
major hydrogen-bonding networks is approximately perpendicular 
to the membrane surface, and may serve as a proton-transfer channel 
to connect Qs to the aqueous phase’. We also found a water cluster 
parallel to the membrane plane, similar to that found in Rba. sphaer- 
oides RC'*'8. These hydrogen-bond networks support the idea that 
there = multi-entry proton uptake networks for the protonation 
of Qz 7 

Among the 21 lipids identified, nine are tentatively assigned to 
cardiolipins (CDLs), ten to phosphatidylglycerols and two to phos- 
phatidylethanolamines (PEFs) (Fig. 3d, Extended Data Table 2). 
This number is much larger than for the RC-only structure in which 
only 1-2 lipids were found, presumably due to the loss or disorder of 
the lipid molecules upon solubilization of the RC-only complex, or 
replacement by the detergents used. The distribution of the lipids is 
asymmetric: all of the CDLs were found at the cytoplasmic side 
with their head groups localized at the surface of the membrane, 
whereas PEFs and phosphatidylglycerols were located at both the 
cytoplasmic and the periplasmic sides. All of the three lipids that 
have previously been reported at 3.0 A resolution? were con- 
firmed, but with some modifications: the PEF and one of the 
two phosphatidylglycerols were reassigned as CDL and PEF, 
respectively. 


Interactions of LH1-RC and interactions among LH1 

LH1 and RC interact at both the periplasmic and the cytoplasmic sides, 
either directly (Fig. 4a, b) or indirectly (for example, through lipids; 
Extended Data Table 4). As discussed, the newly built loop region 
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Fig. 4 | Interactions between LH1 and RC, and among LH1. For clarity, 
only protein-protein interactions are depicted. a, b, Interaction sites 
(boxed) between LH1-a and RC at the periplasmic side (a) and the 
cytoplasmic side (b). The numbers 1-6 in the boxed areas correspond 

to panels a-f in Extended Data Fig. 6. L subunit, magenta; M subunit, 


(172-196) of Cyt forms two hydrogen bonds between C-Ser176, 
C-Gly177 and the neighbouring «-Asp48, «-Ser41; this may stabilize 
this region of the Cyt subunit, indicating that this conformation rep- 
resents its intact state. In addition, M-Leu109, C-Arg47, L-Arg85, and 
H-His7 interact with the neighbouring LH1 a-Ser41 or a-Asp48 at 
the periplasmic side (Extended Data Fig. 6a—c). Importantly, a-Ser41 
and a-Asp48 are two of the main residues involved in the interac- 
tion between LH1 and RC subunits or lipids at the periplasmic side 
(Extended Data Table 4). At the cytoplasmic side, two arginine residues 
(Arg18 and Arg19) located at the beginning of the «-subunit helices 
have a major role in interacting with the RC subunits or lipids, and 
other residues (11414, Asp16 and Ser23) are also involved in interactions 
with the RC and lipids (Fig. 4b, Extended Data Fig. 6d-f, Extended 
Data Table 4). These two arginines, together with some other arginines 
and lysines from the RC and a-Lys10 or 6-Lys15, form a positively 
charged layer at the membrane surface. This might interact with the 
phosphate group of the lipids, thereby strengthing the association of 
LH1 with the RC. 

Extensive interactions between the LH1 «$-heterodimers are found 
near to the Ca”*-binding site in the C-terminal domain on the peri- 
plasmic side, especially in the region of residues a-42-49 (Fig. 4c). 
This region forms a characteristic short turn structure at the surface 
of the membrane; this occurs only in Tch. tepidum, because a resi- 
due at the position 43 is deleted in this organism in comparison with 
other photosynthetic bacteria”. This characteristic structure enables 
a-Asp43 to form hydrogen bonds with a-Asp48, a-Ser54, a-Tyr55 
and a-Gln56 from the neighbouring «-polypeptide. In addition, 
a-Asn45 is also a key residue because it is conserved in almost all 
purple bacteria, and forms extensive interactions with its neighbouring 
subunits. 8-Arg43 is also highly conserved and is involved in addi- 
tional interactions with its neighbouring subunits (Extended Data 
Table 4). Another strong hydrogen bond was found between B-Pro44 
and a-Tyr55. Conversely, only one strong hydrogen bond was found 
between B-Leu46 and (nm — 1) B-Tyr42 at the membrane surface for 
the B-( interactions. 

In contrast to the extensive interactions in the C-terminal region 
on the periplasmic side, there are fewer interactions in the N-terminal 
region on the cytoplasmic side. 38-Thr7 is hydrogen bonded to a-Leu13 
and a-Trp12, and 8-Asp11 interacts with a-Tyr9 and a-Lys10 (Fig. 4c). 
Taken together, the interactions in both the N-terminal and C-terminal 
regions ensure a tight connection of the LH1 a6-apoproteins 
as well as a joint coordination to BChls a, carotenoids and Ca”* ions 
(Fig. 5a, b), which results in a robust, closed concentric elliptical ring 
structure. 
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blue; C subunit, green; H subunit, yellow; LH1 a- and $-subunits, light 
grey. c, Interactions between adjacent LH1 a-a and a-f subunits. LH1 
a-subunits, magenta and dark cyan; LH1 6-subunits, dark yellow and 
cyan; BChl, green; spirilloxanthin, light yellow. 


Ca?+-binding sites 

One of the notable features of the thermophilic LH1-RC is its binding 
of 16 Ca?* ions in the LH1 subunits. We identified all of the ligands 
for Ca** unambiguously: the side chain of a-Asp49, the carbonyl oxy- 
gens of a-Trp46, a-Ile51, (n+ 1) B-Trp45, and two water molecules, 
giving rise to a six-coordinate structure (Fig. 5c, d). Three out of the 
four coordinating residues are hydrophobic. This might be due to the 
fact that this binding site is located just at the periplasmic surface of 
the membrane, which may contribute to the weak binding of Ca”* and 
render it easily exchangeable with other divalent cations”), 

The Ca?*-binding site is located in the C-terminal region of both 
a8 subunits, which contributes to the tight connection of the two 
LH1 subunits (Fig. 5). This is in accordance with the result of Fourier- 
transform infrared spectroscopy measurements, which showed that the 


Fig. 5 | Calcium-binding sites in LH1. a, Binding pattern of 16 Ca** ions 
in LH1. LH1 a-subunits, green; LH1 B-subunits, blue. b, The positions of 
two Ca’* ions in the LH1 subunits. Ca”, red spheres; BChl a, green sticks. 
c, A close-up view of the Ca**-binding site. a-subunit, green; 3-subunit, 
pink. d, Top view of the expanded region of the Ca**-binding site. 
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binding of Ca** reduces the conformational flexibility of LH1-RC”’. 
The structural stability induced by the binding of Ca”* may therefore 
contribute to the thermophilic stability of LH1 as well as the redshift of 
the absorption peak, two unique features of Tch. tepidum. These results 
are in agreement with those of recent spectral measurements”*”*. 

The unique binding of Ca** may be related to the deletion of the 
residue a-43 in Tch. tepidum™, as the insertion of an alanine into this 
site has been shown to disrupt the Ca?* binding of the thermophilic 
LHI, leading to a blueshift in its absorption”’. This provides support 
for the Ca?*-binding environment revealed in the present study and 
for its functional importance. 

Differences were also found in the BChls of LH1 between the present 
and the previous structures. The imidazole ring of 8-His36, a direct 
ligand for 8-BChl, is rotated by about 45° in most cases, and the por- 
phyrin plane of the 8-BChl is rotated by about 10° along its Q, axis (the 
axis connecting pyrrole rings I-III). These changes resulted in a more 
parallel orientation of the neighbouring BChls, giving rise to a stronger 
coupling between the adjacent BChls. 

The novel features of LH1-RC revealed by this high-resolution struc- 
tural analysis—including the unique conformations of several loop 
regions of the Cyt and H subunits and their interaction sites with LH1, 
the location of additional ubiquinones, the presence of water clusters 
that form hydrogen-bonding networks, and the detailed Ca**-binding 
site—provide important information regarding the energy transfer 
between LH1 and RC, the shuttling of ubiquinone through LH1 to 
the cytochrome bc; complex and proton transfer to Qa, and the roles 
of Ca** in the redshift and high thermostability of LH1-RC from 
Tch. tepidum. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0002-9. 
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METHODS 


No statistical method was used to predetermine the sample size. The experiments 
were not randomized, and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Purification and crystallization. Tch. tepidum cells were grown in a growth cham- 
ber (BIOTRON, LH-410PFP-SP, NK System,) at 49 °C for 7 days. The light illumi- 
nation was provided by LED lamps specified for plant growth, which have emission 
peaks at around 450 nm and 645 nm, respectively, at a light intensity of 30 »E m~*s!. 
The bacterial cells grown under these conditions appeared to have a larger ratio of 
LH1-RC/LH2 according to the absorption spectra, which suggests that there are 
more LH1-RC in the same amount of wet bacterial cells. LH1-RC complex was 
purified as described previously””* with slight modifications. The final LH1-RC 
samples with a ratio of A915/A2go greater than 2.20 were collected and precipitated 
by addition of polyethylene glycol 1,450 to a final concentration of 13% (w/v), and 
then suspended in 20mM MES (pH 6.2) containing 3.4% n-octyl-phosphocholine 
(OPC) to a concentration of 20 mg protein per ml. Crystallization was performed 
by a microbatch-under-oil method, in which 21] of the above protein solution was 
mixed with an equal volume of the precipitant solution containing 50 mM MES 
(pH 6.2), 50mM CaCl, 10mM MgCl, 3.4% OPC and 26% polyethylene glycol 
1,450. The crystals grew to sizes of 0.3 x 0.4 x 0.05 mm? to 0.4 x 0.8 x 0.2 mm? in 
10 days at 20 °C (Extended Data Fig. 1a), and were then transferred into a 10,11 
cryoprotectant solution containing 50 mM MES (pH 6.2), 3.4% OPC, 30% polyeth- 
ylene glycol 1,450, 50 mM CaCl, and 15% glycerol, and flash-frozen immediately 
in a nitrogen stream. 

Data collection. X-ray diffraction experiments were carried out at beamlines 
BL41XU of SPring-8 and BL1A of the Photon Factory (Japan). The highest- 
resolution diffraction data used for structural analysis was collected at BL41XU 
of SPring-8. The wavelength of X-rays used was 1.0 A and the beam size was 
35 x 22 |um?. The diffraction images were recorded with a Pilatus 6 M detector, 
and the crystals were rotated by 0.1° in a helical manner. A total of 5,400 images 
covering a rotation angle of 540° were collected. The photon flux of the beamline 
used was 6.8 x 10!! photons per second (after attenuation by 0.75-mm thick 
aluminium), and the exposure time was 0.1s for each diffraction image. The 
diffraction data was processed, integrated and scaled using the XDS Program 
Package (version October 15, 2015)?°, and the reflection data statistics are 
summarized in Extended Data Table 1. 

Structure refinement. The initial structure of the LH1-RC complex was solved by 
the molecular replacement method using the Phaser program in PHENIX (version 
1.12-2829)*. The structure of LH1-RC previously determined at 3.0 A resolution 
from Tch. tepidum (PDB code: 3WMM) was used as the search model, with the 


Ca** ions, lipid and solvent molecules omitted. Five per cent of reflections were 
used for the free R factor calculation in the structure refinement. The initial model 
was subjected to rigid-body and restrained refinements successively in a resolu- 
tion range of 50-2.0A. Incorporation of cofactors, lipid and detergent molecules 
and model modification were performed using COOT (version 0.8.2)*!. For the 
assignment of lipid molecules, positions of the phosphorous atoms in lipids were 
confirmed either by the strong electron density interacting with the positively 
charged amino acid residues or the peaks found in the anomalous map. The lipids 
were assigned, on the basis of the electron density of the polar head group, as 
CDLs when they were connected to each other, and as PEFs and phosphatidylg- 
lycerols when they interact with the negatively charged amino acids and neutral 
groups, respectively. Positional and isotropic parameters were refined in the res- 
olution range of 50-1.9 A. After solvent molecules were included in the model, 
TLS (translation, libration, screw) refinement was performed, and the final model 
was refined to Ryork = 18.15% and Rfree = 21.52%, with 98.42% of residues in the 
favoured Ramachandran region, 1.51% in the allowed region and 0.08% in the out- 
liers. The relatively high R values may be attributed to blurred electron densities at 
the terminal regions of the LH1 polypeptides, especially the N terminus, resulting 
in higher B-factors in these regions. In addition, some residual densities in the gap 
region between RC and the LHI ring were not modelled, and they may be flexible 
fragments of lipids and quinones. The refinement statistics are listed in Extended 
Data Table 1, and the quality of the structure was analysed by using PROCHECK™. 
Figures were generated with the PyYMOL program*’. 

Reporting Summary. Further information on experimental design is available in 
the Reporting Summary linked to this paper. 

Data availability. Atomic coordinates and structure factors for the reported crystal 
structure have been deposited in the Protein Data Bank under accession number 
5Y5S. 
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Extended Data Fig. 1 | Quality of the LH1-RC crystal and its packing crystal taken at BL41XU of SPring-8, Japan, with a wavelength of 1.0A 
pattern. a, An image of the LH1-RC crystals obtained in the present at 100K. This diffraction image was obtained reproducibly with many 
study. These crystals were obtained reproducibly under the present crystals tested. c, d, Packing patterns of the previous (c) and the present 
crystallization conditions. b, A typical diffraction image of the LH1-RC crystal (d). 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


Extended Data Fig. 2 | Close-up views of the electron density maps for 1.9A resolution. a-d, The special-pair BChls (a), one pair of the LH1 
some of the cofactors of LH1-RC. The blue mesh represents the 2F,—F. BChls (b), one of the CDL (c) and the Qg molecule (d). 
map contoured at 1.00, taken at a wavelength of 1.0 A and analysed to 
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Extended Data Fig. 3 | Comparison of the arrangement of the cofactors _and present 1.9 A structures. c, The same as b, viewing from the side of 


between the previous and present structures. a, Arrangement of the the membrane. In b and ¢, the cofactors revealed in the present 1.9A 
cofactors in the previous 3.0 A structure, with a view from the top of the structures are coloured differently, whereas those in the previous 3.0 A 
membrane. b, Superposition of the cofactors between the previous 3.0 A structures are depicted in grey. 
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Extended Data Fig. 4 | Comparison of the protein structures between view from the membrane plane. c, d, Superposition of the LH1 subunits 
the previous and present structures. a, Superposition of the RC subunits between the previous 3.0 A and present 1.9 A structures, with a side 
between the previous 2.2 A and the present 1.9 A structures, with a side view (c) and top view (d) relative to the membrane plane, respectively. 
view from the membrane plane. b, Superposition of the RC subunits In all panels, the present 1.9 A structure is coloured, whereas the previous 
between the previous 3.0 A and the present 1.9 A structures, with a side structures are depicted in grey. 
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Extended Data Fig. 5 | Hydrogen-bonding networks for the protonation —_and the residues (green) from the H-subunit (pale cyan). Qa and Qg are 


of Qg. a, Two possible proton channels connecting Qs to the cytoplasmic depicted in violet and red, respectively, and the non-haem iron is depicted 
surface. The thick arrow (coloured in blue) indicates the main channel in deep purple. The hydrogen bonds are depicted as dashed lines. Water 
formed within the H-subunit, which is enlarged in b, and the thin arrow molecules participating in the hydrogen-bonding networks are depicted in 
indicates the second channel. b, The main hydrogen-bonding network orange, and those not participating are depicted in grey. 


indicated by the thick arrow in a, formed by a number of water molecules 
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Asp 48/0. 
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Tyr 134/M 


Arg 19/D 


Arg 19/K 


Extended Data Figure 6 | Protein-protein interactions between LH1 in Fig. 4. d-f, Interactions between the LH1 a-subunits and RC subunits at 
and RC. a-c, Interactions between the LH1 a-subunits and the RC the cytoplasmic side. Panels d-f correspond to boxed areas 4-6 in Fig. 4. 
subunits at the periplasmic side. Panels a-c correspond to boxed areas 1-3 
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Extended Data Table 1 | Data collection and refinement statistics 


LH1-RC 

Data collection 
Space group C121 
Cell dimensions 

a, b, c (A) 145.23, 143.81, 210.28 

a, By (©) 90.00, 90.74, 90.00 
Resolution (A) 46.92-1.90 (1.968-1.900)* 
Ruerge 0.1035 (1.863) 
I/ol 9.47 (1.14) 
Completeness (%) 99.95 (99.94) 
Redundancy 9.2 (8.0) 
Refinement 
Resolution (A) 46.92-1.90 (1.968-1.900)? 
No. reflections 338536 (33812) 
Rwor / Ree 0.1815 (0.3558)/0.2152 (0.3737) 
No. atoms 

Protein 22003 

Ligand/ion 5022 

Water 956 
B-factors 65.45 

Protein 63.68 

Ligand/ion 73.62 

Water 63.26 
R.m-s. deviations 

Bond lengths (A) 0.018 

Bond angles (°) 1.77 


“Values in parentheses are for the highest-resolution shell. The table was prepared using PHENIX with all automatic 


default settings. 
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Extended Data Table 2 | Components of LH1-RC determined at 1.9A resolution 


Proteins Cofactor Numbers 


BChl a 4 


ores) 
Ce 


Spirilloxanthin 1 


Non-heme-Fe 1 


104 
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Extended Data Table 3 | Average B-factors of the RC subunits and the a- and 8-apoproteins of LH1 


LHI p- No. of Average 
apoproteins | atoms B-factors 


417 


LHI a- No. of 
a 
CC Ee 
Ce ca 


582 
580 
565 


Average 
B-factors 
72.17 
75.37 
79.27 
75.90 
69.46 
62.35 
56.56 
60.01 
65.40 
67.74 


571 


74.25 


564 
569 
548 
56 


: 


N 
— 
—_ 


Average 


RC No. of 
subunits atoms 


79.34 
78.86 
72.61 
72.46 
72.36 
70.83 


Average 
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Extended Data Table 4 | Interactions between LH1 and RC 


Periplasmic side 
LH1-a RC 


Protein-Protein 


Asp48/3 Arg47/C 
er41/5 Arg85/L 

Asp48/| His7/H 
er41/S Gly177/C 

Asp48/U Ser176/C 
er41/Q Leu109/M 


Protein-Lipid 


Asp48/1 
Asn45/1 
Asp48/5 


PGV39/a 
Ser41/A 
Asn45/D 


PGV31/a 
eu40/D 
Asn4oiK per 3sod/a 
Leu40/K 
Asp48 
Tyr55 


Gln56 
Asn45_—_—*|Gin56 


PGV34/a 


Asp43 


snd [args 


LH1-8 LH1-8 


Cytoplasmic side 
LH1-a RC 


Protein-Protein 


Arg19/5 Leu258/H 
rg18/9 Asp21/L 
le14/A Arg45/H 
rg19/D Gly57/H 

Tyr134/M 


Argi9/K 
Glu138/M 


Protein-Lipid 


Arg19/9 
rg18/A 
Arg19/A 
le14/A 
Arg18/D CDL/30/a 
rg19/D 
er23/F CDL25/a 
Arg19/K 
Arg18/O CDL27/a 
er23/O 
Arg19/O PEF12/a 
Asp16/Q CDL28/a 


aS CDL29/a 
Arg19/U 

Argi8/Y CDL24/a 
Argi9/Y CDL26/a 


LH1-B 
peer 
Lys10 
Trp12 

Thr7 


PGV40/a 


CDL303/a 
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The gastric proton pump—the Ht, K*-ATPase—is a P-type ATPase responsible for acidifying the gastric juice down to 
pH 1. This corresponds to a million-fold proton gradient across the membrane of the parietal cell, the steepest known 
cation gradient of any mammalian tissue. The H*, K*-ATPase is an important target for drugs that treat gastric acid- 
related diseases. Here we present crystal structures of the H*, K*-ATPase in complex with two blockers, vonoprazan 
and SCH28080, in the luminal-open state, at 2.8 A resolution. The drugs have partially overlapping but clearly distinct 
binding modes in the middle of a conduit running from the gastric lumen to the cation-binding site. The crystal structures 
suggest that the tight configuration at the cation-binding site lowers the pK, value of Glu820 sufficiently to enable the 
release of a proton even into the pH 1 environment of the stomach. 


The pH of stomach fluid decreases to around 1 in response to food 
intake. This highly acidic environment is generated by the gastric 
Ht, Kt-ATPase! and is indispensable for digestion, and is also an 
important barrier to pathogens invading via the oral route. However, 
excessive stomach acidification leads to ulcers, which—although not 
life-threatening—considerably impair the health of affected individ- 
uals”. Acid suppression in combination with antibiotics is the rec- 
ognized treatment to eradicate Helicobactor pylori, a risk factor for 
gastric cancer*. Proton pump inhibitors, such as omeprazole, and a 
recently developed class of acid suppressants called K*-competitive 
acid blockers (P-CABs), which includes vonoprazan, are commonly 
used to treat acid-related diseases*. One compound, SCH28080, was 
found to be hepatotoxic, and has therefore never been developed for 
clinical use. As a P-CAB prototype, however, SCH28080 has been used 
as a specific H*, Kt-ATPase antagonist in vitro’, and several related 
compounds are currently undergoing clinical trials°. Gastric H*, KT- 
ATPase continues to be a prominent target for drugs that treat excess 
stomach acid secretion. 

As with other P-type ATPases, the cation transport performed by 
gastric H*, K*-ATPase is accomplished by cyclical conformational 
changes of the enzyme (abbreviated as E), generally described using 
an E1/E2 nomenclature based on the Post-Albers scheme” (Fig. 1). 
ATP-driven H* export into the gastric lumen and uptake of K* into 
the cytoplasm is electroneutral with a transport stoichiometry that is 
thought to vary from 2H*/2 K* to 1H*/1K* per ATP as the lumi- 
nal pH decreases®. A hallmark of the P-type ATPase family is the 
auto-phosphorylation of an invariant aspartate—Asp385, in H*, K*- 
ATPase—during the transport cycle to form a phosphoenzyme inter- 
mediate (E1P, E2P). 

The H*, K*-ATPase comprises two subunits. Its catalytic a-subunit 
is highly homologous to those of related P2-type ATPases such as 
Nat, Kt-ATPase’ and sarco(endo)plasmic reticulum Ca”+-ATPase 
(SERCA)"®, which share 65% and 35% sequence identity, respectively, 
with the a-subunit of Ht, Kt-ATPase. The «-subunit comprises 10 
transmembrane helices (TM1-TM10), which contain the cation-binding 
sites, and large cytoplasmic domains—the nucleotide, phosphorylation, 
and actuator domains. In addition to the a-subunit, Ht, Kt-ATPase 


and Na*, Kt-ATPase require a type II membrane protein 3-subunit for 
functional expression as an a-6-complex. 

H*, K*-ATPase pumps H* from the neutral cytoplasm of the pari- 
etal cell (pH 7) to the acidic milieu of the stomach (down to pH 1; 
an approximately 10°-fold H* gradient)'’. Releasing H* into a pH 1 
environment is an especially challenging task because the pK, of free 
carboxyl groups is normally about 3-5. The molecular mechanism 
underlying H* transport into the stomach has long remained elusive. 
Here we describe the crystal structures of gastric Ht, Kt-ATPase in a 
luminal-open E2P conformation bound to vonoprazan or SCH28080, 
analysed at 2.8 A resolution. These structures define the molecular 
interaction between P-CABs and H*, K*-ATPase, and reveal how H*, 
K*-ATPase expels H* into the stomach even at pH 1. 


Overall structure 

For the crystallization, we used pig gastric Ht, K+-ATPase expressed 
in HEK2935 cells using a baculovirus-mediated system!”!?. To avoid 
excess glycosylation of the six N-linked glycosylation sites located 
on the extracellular part of the B-subunit, we used the GnT1™ strain, 
and endoglycosidase treatment was included during the purification 
steps (Extended Data Fig. 1). Crystals were obtained in the presence 
of detergent and phospholipid, giving type I crystals (Extended Data 
Fig. 1, Extended Data Table 1) as with most other crystallized P-type 
ATPases!°!4, The asymmetric unit of the crystal comprises an a-8 
complex, and several phospholipids and detergent molecules could be 
identified. As crystals were grown in the presence of beryllium fluo- 
ride (BeF3_) and P-CABs (vonoprazan or SCH28080), the molecular 
conformation adopted an E2P state, to which P-CABs are prefer- 
entially bound’® (Fig. 1a, see also Supplementary Video 1). The 
vonoprazan-bound (here termed (Von)E2BeF) and SCH28080-bound 
(here termed (SCH)E2BeF) structures were virtually the same 
(Extended Data Fig. 2, root mean square deviations (1.m.s.d.) =0.79 A). 
The overall structure of Ht, Kt-ATPase was very similar to the 
corresponding structures of SERCA Mg”* E2BeF'® (r.m.s.d. 2.3 A) 
and ouabain-bound Nat, Kt-ATPase E2P!” (r.m.s.d. 1.3 A) as defined 
by the relative orientations of the three cytoplasmic domains and the 
arrangement of the transmembrane helices. BeF;~ mimics a bound 
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Fig. 1 | Crystal structure of gastric Ht, Kt-ATPase in complex with 
vonoprazan. a, Overall structure of the luminal-open E2P state of 

Ht, Kt-ATPase complexed with vonoprazan, (Von)E2BeF, in ribbon 
representation. Colour of the a-subunit gradually changes from the N 
terminus (blue) to the C terminus (red). Invariant Asp385, the phosphate 
analogue BeF3" (blue) in the phosphorylation domain and the TGES 
motif located at the edge of the actuator domain are highlighted as 
spheres. Colour of the 8-subunit changes gradually from dark grey 

(N terminus in the cytoplasm) to light grey (C terminus in the gastric 
lumen). Two phospholipids (dioleoylphosphatidylcholine), two detergent 
molecules (octaethylene glycol monododecy]l ether) and three N-linked 
N-acetylglucosamines are modelled in the structure (sticks). Approximate 
location of the membrane is indicated by black lines. Chemical structure 
of vonoprazan is provided in the upper left. Post-Albers type reaction 
scheme for H*, Kt-ATPase is shown in the lower left. Inside and 

outside of the scheme represent cytoplasmic and luminal sides of the 
parietal cell, respectively. A, actuator domain; N, nucleotide domain; P, 
phosphorylation domain; TM, transmembrane domain. b, Close-up view 
of the phosphorylation site. BeF;~ (blue) bound to Asp385 (light blue) 
and coordinating Mg”* (green) are shown as spheres. Ht, K+-ATPase 
(Von)E2BeF (light blue), SERCA E2BeF (RCSB Protein Data Bank (PDB) 
code: 3B9B, yellow)'* and SERCA E2-AlF (PDB code: 2ZBG, pink)’® are 
superimposed according to their phosphorylation domain structures. The 
TGES motif in each structure is indicated by dark colours (only amino 
acids of Ht, K*-ATPase are shown in the stick representation). 

c, Comparison of the transmembrane helices (TM1-TM10 for the 
a-subunit, and TMB for the B-subunit) between luminal-open (Von)E2BeF 
(coloured ribbons as in a) and the low-resolution model of luminal-closed, 
P-CAB-free, E2-AlF of Ht, Kt-ATPase (grey ribbons)’. A cross section 
of the luminal transmembrane region parallel to the membrane plane is 
shown, viewed from the luminal side of the membrane. Arrows in b and 

c indicate displacement of the TGES motif and indicated transmembrane 
helices associated with luminal gate closure (E2BeF to E2-AlF)’®. 


phosphate and forms a covalent bond with the invariant Asp385 in the 
DKTG motif of the phosphorylation domain. Bound BeF37 is covered 
by a segment containing the TGES (Thr228-Gly229-Glu230-Ser231) 
motif from the actuator domain, which prevents spontaneous hydrol- 
ysis of the aspartylphosphate by the bulk water, in marked contrast to 
the E2-P transition state captured in SERCA with aluminium fluoride 
(AIF, )!8 (Fig. 1b). The position of the actuator domain results in tight- 
ening of the linker region connecting TM2 and the actuator domain, 
and elongation of an «-helical segment (Extended Data Fig. 3). Because 
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Fig. 2 | P-CAB-binding site. Bound SCH28080 is superimposed on the 
(Von)E2BeF structure to show their partially overlapping binding sites in 
the vicinity of Tyr799. The transmembrane helices are shown as ribbons 
with colouring as in Fig. 1a, except that TM5 is shown in green for clarity. 
Amino acids involved in the P-CAB coordination are shown as sticks. 
Structures are viewed from approximately parallel to the membrane 

plane (a) or perpendicular to the membrane plane from the luminal side (b). 


of the extension, the TM1-TM2 bundle assumes an upright position 
towards the cytoplasmic side, which is coupled with the laterally 
open position of the TM3-TM4 bundle. As a consequence, the 
transmembrane helices have a luminal-open arrangement (Fig. 1c), 
in marked contrast to the P-CAB-free luminal-closed E2-P tran- 
sition state of Ht, Kt-ATPase E2-AlF!*”° determined by electron 
crystallography at 6.5 A resolution (see Extended Data Fig. 3e for 
the schematics of the conformational change that accompanies 
luminal gating). The gate opening enables P-CABs to access their 
binding site from the luminal side (Extended Data Fig. 2c, d). These 
structural features clarify that the present Ht, Kt-ATPase struc- 
tures, (Von)E2BeF and (SCH)E2BefF, adopt a luminal-open E2P 
state stabilized by a bound P-CAB”!. 


Binding site for P-CAB 

The electron density maps define the binding mode of the two P-CABs 
vonoprazan and SCH28080, and the residues coordinating them, in a 
luminal-facing conduit that extends to the cation-binding site (Fig. 2, 
Extended Data Figs. 4, 5 and Extended Data Table 2), which is appar- 
ently consistent with the Kt-competitive inhibition of Ht, K*-ATPase 
activity by blocking K~ entry to the cation-binding site. The binding 
sites of vonoprazan and SCH28080 were previously thought to overlap 
on the basis of their similar inhibitory actions’. Our structures show 
that they do indeed partially overlap but are also distinct, as detailed 
in Supplementary Information. 


The gating latch 

Extensive studies of P2-type ATPases and electron crystallo- 
graphic structures of Ht, Kt-ATPase’ have revealed the conforma- 
tional changes required to regulate the luminal gate (Extended Data 
Fig. 3e). Luminal gating—mostly operated by the TM1-TM2 and 
TM3-TM4 bundles—is allosterically regulated by the bound phosphate 
on the phosphorylation domain, and by the coordinating TGES loop 
on the actuator domain (Fig. 1b) that connects to TM1 and TM2. The 
key conformational change required for luminal gating is the verti- 
cal sliding movement of the TM1-TM2 bundle relative to TM3-TM4 
(Extended Data Fig. 3). In the luminal-open P-CAB-bound E2P state 
of Ht, K*-ATPase, the side chain of Ile119 in TM1 lies on top of the 
side chain of Met334 in the luminal portion of TM4, acting as a latch, 
and the TM1-TM2 bundle is held in an upright position towards the 
cytoplasmic side (Fig. 3a, b). If the positioning is indeed latch-like, the 
substitution of either e119 or Met334 with a smaller side-chain alanine 
would induce spontaneous slipping of the TM1-TM2 bundle towards 
the luminal side to close the luminal gate. The gate closure signal would 
in turn be transmitted to the actuator domain, triggering a change 
in the coordination between the TGES motif and the bound phos- 
phate (Fig. 1b), and a bulk water molecule introduced subsequently 
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Fig. 3 | A gating latch. a, b, Comparison of the luminal-open (Von) 
E2BeF (coloured ribbons) and low-resolution E2-AlF model” (grey 
ribbons, luminal-closed E2-P transition state) of H*, K*-ATPase. Ile119 
and Met334 are shown as stick representations. Arrows indicate the 
displacement of Ile119 and Met334 C, positions (indicated in red on 

TM1 and TM4 helices) from the luminal-open to the luminal-closed 
forms. Structures are viewed from the luminal side (a) and approximately 
parallel to the membrane plane (b). c, K*-dependent ATPase activities of 
the wild-type enzyme and indicated mutants. Data plotted were corrected 
for background values in the absence of K* and the presence of 10 41M 
SCH28080, and normalized to their maximum velocity set to 100%. The 
value in the absence of KC] therefore indicates the H*-ATPase activity 
that corresponds to the rate of the spontaneous E2P dephosphorylation. 
Individual data from triplicated points at eight K* concentrations were 
plotted. Representative results from more than three independent 
measurements for each of the mutants are shown in the figure. d, Sequence 
alignment of pig gastric H*, Kt-ATPase (a1) with other P2-type ATPases. 
The positions for Ile119 and Met334 in pig gastric H+, K*-ATPase 
(highlighted in red) are framed with red boxes. TM4L, luminal position in 
TM4; TMAC, cytoplasmic portion in TM4; WT, wild type. 


into the phosphorylation centre would induce dephosphorylation. 
In the native transport cycle, this sequence of actions is induced by 
the binding of the counter-transporting K* to the cation-binding 
site as a rate-limiting step. This is why H*, K*-ATPase activity is 
accelerated in a K*-dependent manner, as seen in the wild-type 
enzyme (Fig. 3c). By contrast, the ATPase activities of Ile119Ala and 
Met334Ala mutants are nearly K*-independent and constitutively 
active (Fig. 3c and Extended Data Table 3). These amino acids are 
located a long way from the cation-binding site, and it is therefore 
unlikely to be the case that these mutations affect cation-binding 
site properties. On the other hand, the Ile119Met and Met334Ile 
mutants as well as the Ile119Met/Met334Ile double mutants—all 
of which have large hydrophobic side chains—exhibit normal K*- 
dependent ATPase activity, as seen in the wild-type enzyme. We 
therefore conclude that in the luminal-open state, these hydropho- 
bic amino acids act as a latch that prevents a sliding movement of 
the TM1-TM2 bundle with respect to the luminal portion of TM4, 
and that they are therefore important for tight coupling between K* 
binding and dephosphorylation (Extended Data Fig. 3e). A compar- 
ison of the amino acid sequences of P2-type ATPases reveals that the 
amino acid residues at the corresponding positions in TM1 and the 
TM4 luminal portion are all bulky and hydrophobic (Fig. 3d), which 
suggests that the mechanism of luminal gate regulation described 
here is likely to be conserved among P2-type ATPases. 
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Fig. 4 | Cation-binding site in the luminal-open E2P state. a, Sequence 
alignment of the indicated transmembrane helices among related P2- 
type ATPases. The amino acids explicitly discussed here are highlighted 
in red. b, c, Close-up of the cation-binding site in Ht, Kt-ATPase (Von) 
E2BeF in stick representation, viewed approximately perpendicular 

to the membrane from the cytoplasmic side (b) and parallel to the 
membrane from the TM4 side (c). Dotted lines are shown between 
residues with < 3.5 A between neighbouring atoms, presumably making 
hydrogen bonds or an electrostatic interaction (Lys791-Glu820). A 
water molecule (red) is also indicated. d, K*-dependence of SCH28080- 
sensitive ATPase activity of the indicated mutant enzymes determined as 
in Fig. 3c. Glu343Asp exhibited no detectable ATPase activity. Individual 
data from triplicated points at each of the indicated K* concentrations 
were plotted, and representative results from more than three independent 
measurements for each mutant are shown in the figure. 


Mechanism of proton extrusion 

According to the Post-Albers scheme (Fig. 1a), the luminal-open E2P 
state is an intermediate state occurring just after proton release, and 
poised for subsequent K~* binding. In fact, the cation-binding site is 
exposed to the luminal bulk medium when bound P-CAB is removed 
from our structures (Extended Data Fig. 2b, c). According to previ- 
ous H* transport measurements using inside-out vesicles taken from 
pig stomach’, two protons are released in exchange for two K* ions at 
neutral pH. However, in theory, under acidic conditions only a single 
proton can be transported, and only a single K* counter-transported, 
per hydrolysed ATP®. This variable transport stoichiometry hypothesis 
for H*, K*-ATPase® therefore requires two distinct proton-binding 
sites with different pK, values, but details of the molecular mechanisms 
have remained elusive. Previous mutagenesis studies have demon- 
strated that Nat, Kt-ATPase*® and H*, K*+-ATPase”® transport Nat 
and Ht, respectively, generally by using the same conserved carboxylic 
acids of their respective cation-binding sites in TM4 and TM6, except 
for a lysine residue in TM5 of Ht, Kt-ATPase (Fig. 4a). This lysine 
(Lys791 in the pig sequence), which is invariant among gastric Ht, K*- 
ATPases, is a serine in Na*, Kt- ATPase and SERCA, and is predicted to 
be important for Ht, Kt-ATPase properties, such as proton transport”, 
net electroneutral cation transport”® and inherent E2 preference”®. 

In our structures, the carboxyl residue of Glu820 is at the centre of 
the cation-binding site (Fig. 4b, c and Supplementary Video 2). This 
glutamate residue is surrounded by other polar amino acids, including 
Asn792, Glu795 and Lys791. The juxtaposition of the two glutamates 
Glu795 and Glu820 (2.5 A between their closest oxygens) indicates that 
one of these acidic residues is protonated. Because the charge-neutralized 
Glu795Gln mutant exhibits an ATPase activity profile comparable 
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Fig. 5 | A model for the proton extrusion into the acidic solution by 

the gastric H*, Kt-ATPase. In the Ht-occluded E1P state (left), all 

three glutamates in the cation-binding site are protonated; otherwise, 

Ht would be incorporated into the cation-binding site owing to its high 
concentration in the stomach when the luminal gate opens. In the luminal- 
open E2P state, the proton-binding affinity of the Glu820 carboxyl is 
strongly reduced because of its juxtaposition with Glu795, a hydrogen 
bound to Asn792 and a salt bridge with Lys791. As a consequence, a single 
H’* is expelled into the luminal acidic solution, presumably via Glu795, 
which is exposed to the surface of the luminal cavity in the structure (red 
arrows). The Glu343 releases Ht only when the luminal solution is neutral 
to weakly acidic according to its own pK, value (orange arrow). After H* 


to the wild-type enzyme and Glu820GIn does not (Fig. 4d), Glu795 
rather than Glu820 is likely to be protonated. Therefore, these two glu- 
tamate residues interact through a hydrogen bond. Glu820 also receives 
hydrogen bonds from Asn792 (distance of 3.0 A) anda water molecule 
(3.5 A). In addition to this hydrogen bond network around Glu820, the 
é-amino group of Lys791 interacts intimately with the carboxylate of 
Glu820 (3.1 and 3.2 A from O81 and O82, respectively), most probably 
forming a salt bridge in the crystal structure as suggested from func- 
tional studies*®”*. The Glu820 carboxy] is thus situated in an unusual 
environment with extensive polar interactions that could lower its pK, 
value. A reduction of pK, values in juxtaposed carboxyl groups of two 
adjacent acidic residues occurs in the catalytic centres of many other 
enzymes; for example, two aspartate residues 2.5 A apart in the catalytic 
centre of pepsin*° were estimated to have pK, values of 1.2 and 4.7*1. 
Several aspartate residues located on the surface of pepsin and receiving 
multiple hydrogen bonds from the surroundings and/or coordinated 
by basic amino acids exhibit unusually low pK, values, displaying a 
negative charge even in the highly acidic environment of the stomach”. 
Therefore, H*, Kt-ATPase Glu820 is a strong candidate for one of the 
proton release sites, presumably through the luminally exposed Glu795 
according to the Grotthuss mechanism (Fig. 5). 

Another glutamate, Glu343, in TM4 is also highly conserved among 
P2-type ATPases and is important for cation transport*. In our struc- 
tures, Glu343 is located at some distance from Lys791 (7.4 A) and other 
glutamate residues (5.7 A and 4.7 A for Glu795 and Glu820, respec- 
tively), and may therefore release an H* only when the lumen is neutral 
to weakly acidic (Fig. 5). These glutamate residues (Glu343, Glu795 
and Glu820) in the cation-binding site are invariant for the gastric H*, 
Kt-ATPase al isoform. The Glu820 residue in H*, K*-ATPase al, 
however, corresponds to a shorter aspartate in Na*, Kt-ATPase and 
non-gastric H*, Kt-ATPase «2, and the latter pump transports both 
Ht? and Na’ (Fig. 4a). The longer side chain of glutamate in the gas- 
tric Ht, Kt-ATPase (Glu820) is likely to be better suited for the tight 
hydrogen bond network that reduces its proton affinity, and may also 
underlie the H* specificity of the El state. 

Plant Ht-ATPase, a similar Ht-transporting pump, is structurally 
and functionally well characterized**. A crystal structure of AHA2 
from Arabidopsis thaliana has revealed that two key residues for the 
H*-transport—Arg655 and Asp684, which correspond to Glu795 and 
Asp824, respectively, in our structures—are located in the cation-binding 
site*’, It has previously been proposed* that these residues form a salt 
bridge as a consequence of the conformation change to the E2P state 


release, K* is incorporated into the cation-binding site in the E2P-K* 
form. Coordination of the Kt ion by Glu820 may release the salt bridge 
between Lys791 and Glu820 (grey dashed line), and Lys791 forms a new 
salt bridge with Asp824 (grey arrow). This sequence of actions triggers a 
conformational change of the whole enzyme to drive the transport cycle, 
and counter-transporting K* ions are occluded in the luminal-closed 
(K*)2E2-P state (right). This mechanism is supported functionally by 
the constitutively active phenotype of charge-neutralized mutants of 
Glu820GIn and Asp824Asn* (Fig. 4d). See Supplementary Information, 
Extended Data Figs. 6, 7, and Supplementary Video 3 for details of 
Kt-binding. 


required to release H*, although the structure determined in this pre- 
vious study is in the El state. The «2-isoform of brine shrimp Na‘, 
Kt-ATPase has two lysines, Lys324 and Lys776, instead of Tyr340 and 
Asn792 in its Ht, Kt-ATPase, and the expression level of this isoform 
rises considerably when the salt concentration in the living environ- 
ment increases**. These lysines in this brine shrimp Nat pump may 
also play an important role in Nat extrusion against a very steep Nat 
gradient. The utilization of a basic amino acid to facilitate extrusion 
may be a general cation transport mechanism. 
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Any Methods, including any statements of data availability and Nature Research 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein expression and purification. The plasmids encoding the cDNA of pig 
gastric H*, K*-ATPase «- and 3-subunits were provided by T. Imagawa. The Flag 
epitope tag (DYKDDDDk), hexa-histidine tag and the enhanced green fluores- 
cent protein (EGFP), followed by a tobacco etch virus (TEV) protease recognition 
sequence, were attached to the amino-terminal of the Met48 of the «-subunit, and 
cloned into a custom-made vector based on a previous report”, The pig gastric H*, 
K*-ATPase 3-subunit (wild type) was also cloned independently. The a-3-complex 
of H*, Kt ATPase was successfully expressed in the plasma membrane using 
baculovirus-mediated transduction of mammalian HEK293S GnT1~ cells, as previ- 
ously described!. The collected cells were broken up using a high-pressure emulsi- 
fier (Avestin) in the presence of protease inhibitor cocktail (Roche), and membrane 
fractions were collected (200,000g¢ for 1h) after removing the cell debris (800g 
for 10min). Membrane fractions were solubilized with 1% octaethylene glycol 
monododecy] ether (C;2Es, Nikko Chemical) in the presence of 40 mM MES/Tris 
(pH 6.5), 20mM Mg(CH3COO)>s, 10% glycerol, 50mM NaCl, 1 mM BeSO,, 3mM 
Nak, 1mM ADP, 5 mM dithiothreitol, and 0.1 mM vonoprazan or SCH28080 on ice 
for 20 min. After removing the insoluble materials by ultracentrifugation (200,000g¢ 
for 1h), the supernatant was mixed with anti-Flag M2 affinity resin (Sigma Aldrich) 
for 2h at 4 °C. The resin was washed with 20 column volumes of buffer consisting 
of 20mM MES/Tris (pH 6.5), 5% glycerol, 2mM MgCh, 50mM NaCl and 0.03% 
CyEg. Flag-EGFP-tagged H*, Kt-ATPase was eluted with 0.2 mg/ml Flag peptide 
(Sigma Aldrich) in the presence of 10 {1M vonoprazan or SCH28080. Eluted frac- 
tions were incubated with TEV protease and MBP-fusion endoglycosidase (New 
England Biolabs) at 4 °C overnight. Digested peptide fragments containing EGFP 
and MBP-fusion endoglycosidase were removed by passing the fractions through 
a Ni-NTA resin (Qiagen) and amylose resin (New England Biolabs), respectively. 
Flow-through fractions were concentrated and subjected to a size-exclusion 
column chromatograph using a Superose6 Increase column (GE Healthcare), equil- 
ibrated in buffer comprising 10 mM MES/Tris (pH 6.5), 1% glycerol, 100 mM NaCl, 
1mM MgCl and 0.03% C;,Es. Peak fractions were collected and concentrated to 
10 mg/ml. The concentrated Ht, Kt-ATPase samples were mixed with 0.5mM 
BeSO,, 1.5mM NaF and 0.1 mM vonoprazan or SCH28080, and then added to 
the glass tubes in which a layer of dried dioleoyl phosphatidylcholine had formed, 
in a lipid-to-protein ratio of 0.3-0.5, and incubated overnight at 4 °C in a shaker 
mixer operated at 120 r.p.m.°”. After removing the insoluble materials by ultra- 
centrifugation, lipidated samples were used for the crystallization. Note that the 
effect of deglycosylation on the ATPase activity was negligible, as evaluated by Kt 
and P-CAB affinities compared with those of wild type without endoglycosidase 
treatment, as well as the native enzyme purified from pig stomach. 
Crystallization. Crystals were obtained by vapour diffusion at 20 °C. For the 
vonoprazan-bound form, a 5-mg/ml purified, lipidated protein sample was 
mixed with reservoir solution containing 10% glycerol, 20% PEG2000MME, 
0.4M CH3COONa, 3% methylpentanediol and 5mM 3-mercaptoethanol. For 
the SCH28080-bound form, reservoir solution containing 10% glycerol, 20% 
PEG2000MME, 0.2 M RbCl, 5% tert-butanol and 5mM (-mercaptoethanol was 
used. Vonoprazan crystals were grown to 400 x 100 x 40 jm in 2 weeks, and 
SCH28080 crystals were grown to 400 x 200 x 200 jm in 3 weeks. Crystals were 
flash frozen in liquid nitrogen. 

Structural determination and analysis. Diffraction data were collected at 
the SPring-8 beamline BL32XU and BL41XU, and processed using XDS. 
Structure factors were subjected to anisotropy correction using the UCLA MBI 
Diffraction Anisotropy server*® (http://services.mbi.ucla.edu/anisoscale/). 
The vonoprazan-bound structure was determined by molecular replacement 
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with PHASER, using the homology model of BYK99-bound Ht, K*-ATPase 
(PDB code: 5Y0B) based on the electron crystallographic structure as a search 
model. Coot was used for cycles of iterative model building and Refmac5 
and Phenix*® were used for refinement. The final crystallographic model of 
vonoprazan-bound Ht, K+-ATPase at 2.80 A resolution, refined to Ryork and Riree 
of 0.237 and 0.288, was deposited in the PDB with accession code 5YLU. For the 
determination of the SCH28080-bound structure, the vonoprazan-bound form 
was used as the starting model for molecular replacement and the final crystallo- 
graphic model at 2.80 A resolution, refined to Rwork and Réree of 0.240 and 0.292, was 
deposited in the PDB with accession code 5YLV. Rubidium ions were identified in 
anomalous difference Fourier maps calculated using data collected at a wavelengths 
of 0.8147 A. The vonoprazan-bound and SCH28080-bound models contained 93.0, 
6.8 and 0.2% and 91.3, 8.1 and 0.6% in the favoured, allowed and outlier regions of 
the Ramachandran plot, respectively. 

Activity assay using recombinant proteins. To measure the ATPase activity, Flag- 
EGFP tag connected by the TEV cleavage site to the N-terminal tail of the wild- 
type a-subunit was used to monitor its expression by fluorescence size-exclusion 
column chromatography*". The wild-type or mutant «-subunit was co-expressed 
with the wild-type 3-subunit using the BacMam system as described above, and 
broken membrane fractions were collected. H*, K*-ATPase activity was measured 
as previously described. In brief, permeabilized membrane fractions (wild type 
or mutant) were suspended in buffer comprising 40 mM PIPES/Tris (pH 7.0), 
2mM MgCl, 2mM ATP and 0-30 mM KCl in the presence of three different 
concentrations of vonoprazan or SCH28080, or their absence, in the 96-well plates. 
Reactions were initiated by incubating the fractions at 37 °C using a thermal cycler, 
and maintained for 1 to 5h depending on their activity. Reactions were terminated, 
and the amount of released inorganic phosphate was determined colourimetrically 
using a microplate reader (TECAN). The inhibition constant (K;) and K*-affinity 
(Km) were determined as previously described”. Note that the Flag-EGFP tag 
and the N-terminal 47 amino acids (not present in the crystallized sample) had 
negligible effects on ATPase activity, as evaluated by their K* and P-CAB affinities 
compared with those of the tag-free wild-type enzyme as well as the native enzyme 
purified from pig stomach. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Atomic coordination and structure factors for the structures 
reported in this work were deposited in the RCSB Protein Data Bank under acces- 
sion numbers 5YLU (vonoprazan-bound) and 5YLV (SCH28080-bound). All other 
data that support the findings of this study are available from the corresponding 
author upon reasonable request. 
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Extended Data Fig. 1 | Crystallization of gastric Ht, Kt-ATPase. 

a, Purification of Ht, Kt-ATPase expressed in HEK293 cells. Lane 1: 
solubilized membrane fraction, lane 2: pass through of Flag resin, lane 3: 
wash fraction, lane 4: elution by Flag peptide, lane 5: TEV protease- and 
endoglycosidase-treated sample, lane 6: pass-through fraction of Ni-NTA 
and amylose resin, lane 7: concentrated peak fractions by size-exclusion 
chromatography. b, The elution profile of affinity-purified H*, K*- 
ATPase by Superose6 Increase 10/300. Black, red and green arrowheads 
indicate elution volume of aggregation, a-8-complex of H*, K*-ATPase 
and cleaved EGFP, respectively. Purification was well reproduced, and 
representative results are shown in the figure. c, d, Crystals of H*, 
Kt-ATPase in the presence of vonoprazan (c) or SCH28080 and Rb* (d). 


tom 


O.D. 280 nm, mAbs 


15005 


Scale bars, 100 jm. e, X-ray diffraction of the vonoprazan-bound crystal. 
Enlarged image shows diffraction spots of up to 2.3 A in the direction 

of the c* axis, although the crystal shows anisotropic diffractions. Most 
crystals showed diffraction spots of up to 2.8 A in similar crystallization 
conditions and a few crystals showed diffraction spots better than 2.3 A, 
as shown in the figure. f, Crystal packing. An asymmetric unit (molecule 
in the lower left, depicted as in Fig. 1d) contains one a-3-complex of 
Ht, Kt-ATPase («-subunit, light blue; 8-subunit, wheat; vonoprazan, 
magenta), packed with P3,21 symmetry. A unit cell and approximate 
location of the membrane planes are provided as grey and yellow boxes, 
respectively. 
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H*,K+-ATPase (SCH)E2BeF ——_Na*,K*-ATPase (Ouabain)E2P SERCA (Mg2*)E2BeF 
Extended Data Fig. 2 | Crystal structure of gastric Ht, Kt-ATPase (grey ribbon). Yellow, K*-binding site at the phosphorylation domain, 
bound to SCH28080. a, Overall structure of the luminal-open E2P state which is homologous to SERCA and Na*, K+-ATPase. Red, anomalous 
of Ht, K*-ATPase complexed with SCH28080 ((SCH)E2BeF) in the peak found at the transmembrane cation-binding site. c, The molecular 
ribbon representations, as in Fig. 1a. Bound SCH28080 and three Rb* surface of (SCH)E2BeF structure, viewed from the luminal side of the 
ions are shown as green and purple spheres, respectively. Inset, chemical membrane. Bound SCH28080 (green sticks) blocks the conduit connecting 
structure of SCH28080. b, Magenta mesh shows anomalous peaks from to the cation-binding site. d, Structure as in c, but with bound SCH28080 
Rb* contoured at the 5c level, indicating that three Rb* ions (blue, yellow —_is removed, showing that Rbt bound to the cation-binding site (purple) is 
and red boxes) are bound to the Ht, K*-ATPase (SCH)E2BeF (shown as exposed to the luminal solution. e-g, The C, traces of the indicated atomic 
colour ribbons). Blue, interface between the nucleotide domain and the models are superimposed on the Ht, K*+-ATPase (Von)E2BeF (blue, with 
actuator domain of the symmetry-related neighbouring molecules bound vonoprazan shown as spheres). 
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rms 
Luminal-open E2P Luminal-closed (K*)E2-P 


Extended Data Fig. 3 | TM2 helix and the hydrophobic cluster. 

a-d, Interface between the actuator and phosphorylation domains, and 
the cytoplasmic portion of the TM2 in the luminal-open E2P state of H*, 
K*-ATPase (Von)E2BeF (a, c) and the luminal-closed E2P transition 

state of SERCA E2-AlF (PDB code: 2ZBG)!® (b, d) are shown. These two 
atomic models are superimposed according to the TM7-TM10 structure. 
Broken box on the whole molecular structure (upper left) indicates the 
region shown in a-d, viewed from left (a, b) or front (c, d) of the molecule. 
Actuator domain (green), TM1-TM2 (blue), and TM3-TM4 (cyan) 
bundles are highlighted. Residues that contribute to the hydrophobic 
interactions’ (orange dotted circles) are indicated as spheres with 
analogous colouring of their respective structural components. Phe170 

in H*, Kt-ATPase is homologous to Tyr122 in SERCA. Because of the 
different coordination geometry between phosphate analogues (BeF3_, 
light blue; AIF,~, pink) and the TGES motif (indicated as dark colour in 
each model) at the interface between the actuator and phosphorylation 
domains (see Fig. le for closed view), the azimuthal position of the 
actuator domain differs between the two structures (by approximately 30°, 
as indicated by the orange arrow in b). As a consequence, the cytoplasmic 
portion of TM2 shows different conformations between the a-helical 
structure in the luminal-open E2P (a, c) and unwound loop structure 

in the luminal-closed E2-P forms (b, d). The C, positions of Ile119 and 
Met334 in Ht, K*-ATPase (a gating latch) and their homologous residues 
in SERCA (Ile71 and Val300) are shown in red (see Fig. 3). e, Schematic 
of luminal gate closure in H+, Kt-ATPase. In the luminal-open E2P state 
(left), Ile119 (TM1) and Met334 (TM4) act as a latch to keep the TM1- 
TM2 bundle in the upright cytoplasmic-side position (indicated by dotted 
lines and arrows). Binding of counter-transporting K* to the cation- 
binding site induces luminal gate closure (right), which is accompanied by 
the lateral movement of the TM3-TM4 bundle (Fig. 1c) and downward- 
sliding movement of the TM1-TM2 bundle (indicated by red arrows in the 
left panel). The sliding movement of TM1-TM2 results in the unwinding 
of the cytoplasmic portion of TM2 and the rotation of the actuator domain 
relative to the phosphorylation domain. Finally, bound phosphate at the 
reaction centre of the phosphorylation domain is hydrolysed owing to 

the displacement of the TGES loop. Because of the missing interaction 
between Ile199 and Met334 in their alanine-substituted mutants (Fig. 3c), 
the TM1-TM2 bundle may slip; therefore, the luminal gate closes 
spontaneously regardless of K*-binding to the cation-binding site. As a 
consequence, spontaneous dephosphorylation is induced, producing the 
Kt-independent ATPase activity. 
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Extended Data Fig. 4 | P-CAB-binding site. Inverse plot of 1/v versus 
1/[K*] for the wild-type enzyme in the presence of different concentrations 
of P-CABs (vonoprazan: 0, 5, 10 and 20 nM (a); SCH28080: 0, 200, 500 
and 1,000 nM (b), blue, green, yellow and red circles correspond to the 
respective P-CAB concentrations), showing typical Kt-competitive 
inhibition of Ht, K*-ATPase activity. Data represent mean + s.e.m. 

of triplicated points at each of the indicated Kt concentrations; 
representative results from more than three independent measurements 
are shown. Their chemical structures are provided in each inset. c, d, The 
2F, — F, electron density maps (contoured at 2c) of the vonoprazan- (c) 
and SCH28080-binding site (d), viewed from approximately parallel to 
the membrane plane. In d, bound SCH28080 is depicted as wheat colour 
for clarity. e, f, Cross sections of the P-CAB-binding sites perpendicular 
to the membrane plane. The sectional surface is shown in light blue, and 
molecular surface is shown as light grey (carbon), with other colours 


corresponding to different elements (red, oxygen; blue, nitrogen; yellow, 
sulfur). Transparent spheres for each of the P-CABs represent their 

van der Waals radius, showing tight binding in their binding pocket. 

g, Structural comparison of the transmembrane region of vonoprazan-bound 
(magenta), SCH28080-bound (green) Ht, Kt-ATPase and ouabain-bound 
Nat, Kt-ATPase (wheat), viewed from luminal side. Bound ouabain 

and Mg?* ion in the Nat, Kt-ATPase structure are shown for clarity. 

h, Ouabain and Mg”* ion are superimposed on the vonoprazan-bound 
Ht, K*-ATPase structure (ribbons). Seven amino acids of the Ht, 
Kt-ATPase, for which mutation provides high-affinity ouabain binding, 
are indicated (grey sticks), and their corresponding amino acids for 

Na*, K*-ATPase are indicated in parentheses. i, Bound SCH28080 

is superimposed on the structure shown in h. See Supplementary 
Information for details. 
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Extended Data Fig. 5 | F, — F, maps for P-CABs. The F, — F. density for vonoprazan (a) and SCH28080 (b) contoured at 5c (blue mesh) is shown in 
stereo view. The amino acids involved in the binding are indicated as sticks. 
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Extended Data Fig. 6 | Cation-binding site in the Rb*-bound, 
luminal-open E2P state. a, b, Close-up of the cation-binding site in 

Ht, Kt-ATPase (SCH)E2BeF viewed approximately perpendicular 

to the membrane from the cytoplasmic side (a) and parallel to the 
membrane from the TM4 side (b). Residues located within 3.5 A between 
neighbouring atoms are connected by dotted lines. Bound Rb* (purple 
sphere) and water molecules (red) are also indicated. c, d, Comparison 

of the cation-binding site between and Rb*-bound (SCH)E2BeF (colour 
ribbons) and (Von)E2BeF (wheat), showing the inclination of Glu820 
side chain towards Rb* accompanied by Rb* binding (arrow). Only polar 


residues in the observed area are shown for clarity. e, f, K*-occluded 
(K*),E2-MgF state of Nat, K*-ATPase (light grey, PDB code: 2ZXE) is 
superimposed on the Rb*-bound (SCH)E2BeF state of Ht, Kt-ATPase 
(colour ribbons). Pink spheres highlighted with red circles (site I and II) 
indicate bound K* in the Na*, K*-ATPase structure. Atomic models are 
aligned based on the TM7-TM10 part of the proteins. Arrows indicate 
displacement of the TM4 luminal portion from the luminal-open to the 
luminal-closed form. TM5 is removed from the structures shown in d and 
f for clarity. 
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Extended Data Fig. 7 | Hydrogen bond networks. A transmembrane 
cation-binding site of Ht, Kt-ATPase (Von)E2BeF is shown, viewed from 
the TM6 side. Only polar residues are shown, and the distances between 
each residue are provided. Spheres indicate positions responsible for the 
Na*-binding site (I-III) in the Nat, K*-ATPase E1P-ADP state**. The 


proximity of Asp942 and Arg946 to one another indicates that these 
residues form a salt bridge. 
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Extended Data Table 1 | Data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 

a, b, c (A) 

a, B, y (°) 
Resolution (A) t 
Rmerge 
I/ol 
Completeness (%) 


Redundancy 


Refinement 
Resolution (A) 
No. reflections 
Rwork | Rree 
No. atoms 
Protein 
Ligand/ion 
Water 
B-factors 
Protein 
Ligand/ion 
Water 
R.m.s. deviations 
Bond lengths (A) 


Bond angles (°) 


Vonoprazan (5YLU) 


P3121 


104.82, 104.82, 367.08 
90, 90, 120 
3.2x3.2%2.8 (2.9-2.8)t 
0.1201(1.689) 

11.04 (1.54) 

87.51 (38.17) 


5.2 (5.5) 


48.18 — 2.8 (2.9 - 2.8) 
58652 (2197) 
23.7/28.8 (38.2/42.7) 
9884 

9612 

235 

37 

43.75 

43.10 

72.45 


28.15 


0.011 


1.25 


SCH28080 (5YLV) 


P3121 


105.05, 105.05, 368.54 
90, 90, 120 
3.0x3.0%2.8 (2.9-2.8) 
0.1186 (2.35) 

9.88 (0.84) 

92.27 (40.65) 


7.8 (8.2) 


48.3 — 2.8 (2.9 — 2.8) 
59284 (2368) 
24.0/29.2 (41.6/47.8) 
9938 

9728 

181 

29 

69.82 

69.39 

95.51 


55.10 


0.010 


1.21 


iThe diffraction data are anisotropic. The resolution limits given are for the a*, b* and c* axes, respectively. 
+Statistics for the highest-resolution shell are shown in parentheses. 
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Extended Data Table 2 | ATPase activity of evaluated mutants 

Ki, nM x fold A Ki, nM x fold A 

Wild-type 2442.3 1 - 150 + 10 1 - 
A123V 3yt:2:3 1.3 7.2 4000 + 70 27 3.4 
N138F 3.54 1.2 1.5 3.7 > 10000 > 67 3.1 
A335V 1241.2 5 3.7 > 10000 > 67 3.3 
A339S 1440.2 0.6 3.3 300 + 49 2.0 3.7 
E343Q 7941.5 3.3 3.6* 130+ 21 0.9 8.4 
600 + 42 4.0 7.0 
1700 + 180 11 4.3 
> 10000 > 67 33 
Y799F Oie2.7 3.8 3.3 1200 + 140 8 3.3 
L809F 12468 5 41 3300 + 350 22 3.6 
400 + 57 2.7 3.6 
360 + 52 2.4 4.9 


Effect of mutation on the inhibition constant (Kj) for the indicated P-CABs is summarized in the 
table. Kj value represents the mean + s.d. determined by K*-competitive inhibition of H*, Kt-AT- 
Pase activity (n=3, independent experiments). Values indicating fold increase (‘x fold’) in the kj 
value of each mutant, compared with that of wild type, are provided for clarity. Distance between 
indicated residues and the closest atom of each P-CAB in the crystal structure are also provided. 
*Distance between the secondary amine of vonoprazan and the closest oxygen atom in the 
indicated residues is shown. Mutants that considerably affected the affinity of vonoprazan, 
SCH28080 or both are highlighted in magenta, green or orange, respectively. 
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Extended Data Table 3 | ATPase activity of mutants 


Vmax Km H*-ATPase 
Mutants % mM % 
Wild-type 100 +3 1.2+0.1 13 
1119M 26+1 0.9+0.2 20 
M3341 23 +1 0.4+0.1 36 
1119M/M3341 35+1 0.9+0.1 15 
11194 52+2 - 54 
M334A 70+1 - 100 
E343Dt - - - 
E343Q 9243 5.5 +0.4 7 
E795D 7541 26+ 3.2 1 
E795Q 125+5 0.4+0.1 30 
E820D 90+3 0.75+0.1 9 
E820Q 19+2 - 96 
D824E 26+2 - 45 
D824N 7+08 - 73 


SCH28080-sensitive ATPase activity of the wild type and indicated mutants (Figs. 3, 4) were 
determined as in Fig. 3c, and parameters were summarized. Data show the relative amount 

of the maximum ATPase activity (Vax) of mutants compared with the wild-type enzyme 

(4.3 jumol mg! h-! in the membrane preparation), K* affinity (Km), and fraction of H*-ATPase 
activity (ATPase activity in the absence of KCl) relative to the maximum H*, K*-ATPase activity of 
mutants. The specific activity of each mutant enzyme was normalized to the expression level of 
H*, K*-ATPase determined by fluorescence size-exclusion chromatography. Value represents the 
mean = s.e.m. determined by fitting of 24 data points (triplicate of 8 different Kt concentrations) 
for each experiment. 

tData for Glu343Asp mutant are not shown because it exhibited no detectable ATPase activity. 
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Alteration of the magnetosphere of the Vela pulsar 


during a glitch 


Jim Palfreyman!*, John M. Dickey!, Aidan Hotan?, Simon Ellingsen! & Willem van Straten? 


As pulsars lose energy, primarily in the form of magnetic dipole 
radiation, their rotation slows down accordingly. For some pulsars, 
this spin-down is interrupted by occasional abrupt spin-up events 
known as glitches!. A glitch is hypothesized to be a catastrophic 
release of pinned vorticity” that provides an exchange of angular 
momentum between the superfluid outer core and the crust. This is 
manifested by a minute alteration in the rotation rate of the neutron 
star and its co-rotating magnetosphere, which is revealed by an 
abrupt change in the timing of observed radio pulses. Measurement 
of the flux density, polarization and single-pulse arrival times of the 
glitch with high time resolution may reveal the equation of state 
of the crustal superfluid, its drag-to-lift ratio and the parameters 
that describe its friction with the crust?. This has not hitherto been 
possible because glitch events happen unpredictably. Here we report 
single-pulse radio observations of a glitch in the Vela pulsar, which 
has a rotation frequency of 11.2 hertz. The glitch was detected on 
2016 December 12 at 11:36 universal time, during continuous 
observations of the pulsar over a period of three years. We detected 
sudden changes in the pulse shape coincident with the glitch 
event: one pulse was unusually broad, the next pulse was missing 
(a ‘null’) and the following two pulses had unexpectedly low linear 
polarization. This sequence was followed by a 2.6-second interval 
during which pulses arrived later than usual, indicating that the 
glitch affects the magnetosphere. 

In 2013 we began a three-year observing programme of the Vela 
pulsar with the aim of recording each single pulse during its next glitch 
(see Methods). On 2016 December 12 at 11:36 universal time (UT), a 
glitch of magnitude Av /v = 1.431 x 10 ° (where v= 11.2 Hz is the 
rotation rate) was observed at both the 26-m telescope installed at 
Mount Pleasant, Tasmania, and the 30-m telescope at Ceduna, South 
Australia. Extended Data Table 1 shows the arrival times at the Solar 
System barycentre, as recorded by the two telescopes. 

Figure 1 shows a plot of the arrival time residuals of single pulses 
recorded at Mount Pleasant over a time range of 72 min centred on the 
glitch. The residuals are the difference between the experimental data 
and the timing-model results for v and /, calculated using 36 min of 
single-pulse data obtained before the glitch. 

The inset of Fig. 1 shows a magnification of the plot around the time 
of the glitch, te (vertical red line; see Methods). Near this time, three 
very-low-probability events occurred: (1) a ‘null, which followed an 
unusually broad pulse, (2) a brief increase in the mean of the timing 
residuals, implying either a decrease in v or, more probably, a change 
in the magnetosphere that affected timings, and (3) a reduction in the 
variance of the timing residuals. 

Figure 2 shows 11 consecutive pulses including the ‘null’ that 
occurred at pulse number 77 (in the recorded file). Although pulses 
72-75 look typical, pulse 76 looks different: the flux is spread smoothly 
over about 10 ms, the entire width of the integrated pulse profile of the 
Vela pulsar. We have not seen a similarly broad pulse shape in the more 
than 100,000 pulses that we have examined. 

The pulse following this broad pulse is the ‘null’ pulse, and pulses 
78 and 79 show minimal linear polarization, as demonstrated by the 


absence of a position angle swing (right column of Fig. 2). Then, typical 
pulse shapes are again observed from pulse 80 onwards. Analysis of 
data collected on other days shows that on average, the single-pulse 
flux density is below the detection threshold of the 26-m telescope 
once every 77,700 pulses. 

Although some pulsars show frequent null pulses, Vela does not*®, 
and general pulsar observations indicate that nulls are not expected to 
occur in young pulsars such as Vela®. We cannot determine whether 
pulse 77 in Fig. 2 is a true null, with zero flux emitted, a very faint pulse 
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Fig. 1 | Timing residuals of single pulses near the time of the glitch. 
The horizontal axis shows the arrival time at the Solar System barycentre 
on modified Julian day 57,734, and the vertical axis shows the residual of 
the arrival time, obtained from the pre-glitch model. The vertical red line 
marks the fitted time of the glitch (t,). The inset shows a magnification 
of the plot. 3.3 s before t,, a ‘null’ occurred (to), followed by an unusual 
change in the timing residuals, with late mean arrival times and reduced 
variances. Because the ‘null’ cannot be timed, it has been placed on the 
0.0 ms line. The horizontal error bar represents the 1a uncertainty in the 
fitting of ty. 
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Fig. 2 | A contiguous sequence of single pulses surrounding the ‘null. 
Each row corresponds to a single pulse, with time increasing from bottom 
to top and the pulse number (in the recorded file) indicated in blue. The 
‘null’ is pulse 77. For reference, the bottom row shows the integrated 
pulse profile. The left panels show the total flux density in arbitrary units, 
the middle panels show linear polarization and the right panels show 

the position angle of the linear polarization. Circular polarization was 
negligible and is not shown. The slight offset in the linear polarization 

is due to off-pulse noise. Only about a fifth of the pulse period is shown. 
The position angle is not plotted for pulses 78 and 79 because no linear 
polarization was detected immediately after the ‘null. 


that is below the detection threshold of the 26-m telescope, or even a 
pulse with more severe broadening than pulse 76. However, such a 
pulse is a rare event. The ‘null’ pulse appears at time fo, only 3.3 s (37 
pulsar rotations) before the best estimate of tes which has a lo uncer- 
tainty of 2.5 s. The probability of a null appearing anywhere in the 37 
rotations before the glitch is P=4.8 x 1077. 

Soon after the ‘null; at t; = to + 1.8s (20 pulsar rotations), a substan- 
tial change occurred in both the mean and the variance of the tim- 
ing residuals, which lasted for 2.6 s (29 pulsar rotations), until time 
ty. We searched two other full days of data (more than about 1.4 x 10° 
pulses) for a sequence of pulses of similar length and with a greater 
change in the mean, combined with a smaller change in variance than 
that observed here. None was found. Figure 3 shows a scatter plot of 
the mean and standard deviation (0) of single pulses over the 36-min 
period before f,, as shown in the left half of Fig. 1. This extraordinary 
offset in the mean arrival times of the sequence of pulses and the low 
corresponding variance suggest that the pulsar emission mechanism 
was affected by the glitch process during this interval. 
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Fig. 3 | Scatter plot of the mean and standard deviation of single-pulse 
timing residuals. Data are shown for the 36 min leading up to the glitch 
(left half of Fig. 1), calculated using a sliding window of 21 data points. 
The blue dots correspond to the period fp-t; and the red outliers to 

t;-t). The connecting lines show how the sequence progresses. The units 
are milliseconds. 


Figure 4a shows a 260-s view of the timing residuals, with the ‘null at 
ty marked, Fig. 4b provides the cumulative sum of the timing residuals, 
and Fig. 4c shows the cumulative sum after glitch modelling has been 
applied to the 72 min of data. The cumulative sums highlight overall 
changes that are not apparent in the residual plot. The sequence of 
pulses showing increased mean and reduced variance commences at 
t, and finishes at t,. Label t; marks what appears to be a permanent 
speed-up in rotation after the glitch process has been completed. 

We note that fg can be fitted to a precision of only 2.5s, but the ‘null’ 
pulse provides a fiducial time tp with a precision of the pulsar rotation 
rate, 89 ms. The timing of the spin-down, from t; to f2, is based on the 
sustained change in the mean and variance shown in the inset of Fig. 1. 
Extended Data Table 2 shows the arrival times of these events at the 
Solar System barycentre. 

The 2.6s from tf, to f could be associated with the unpinning process 
of superfluid vortices, and the associated changes in angular momen- 
tum, which are presumed to be the cause of pulsar glitches. An alter- 
native explanation is changes in the magnetosphere triggered by the 
glitch. These changes could be caused by the unpinning of the vortices 
affecting the magnetic flux tubes in the core. 

The 4.4-s interval (49 pulsar rotations) between fo and tf) may indi- 
cate the rise time (7,) of the glitch, that is, the time required to transfer 
angular momentum from the superfluid-permeated inner crust to the 
outer crust. The rise time of the glitch has implications for the equa- 
tion of state. Sourie et al.? compare the predictions of two equations 
of state, the density-dependent hadronic (DDH) model and DDH6, 
which takes into account a scalar isovector interaction channel. For 
a pulsar mass of 1.3 M5-1.6 Ma, where Mo is the mass of the Sun, 
the DDH model predicts a glitch rise time of 4-5.5 s and DDH6 
predicts 2.5-3.5s. If 7, is indeed 4.4, then DDH might be the preferred 
equation-of-state model. 

The 43.8-s interval (490 pulsar rotations) between t, and t; may 
correspond to the time after the glitch when the crust and interior are 
synchronized, before their rotation rates become decoupled. 

Sedrakian & Cordes’ present a model in which the crustal magnetic 
field provides a potential barrier against the superconducting proton 
vortices in the core, which in turn act as a barrier to the superfluid 
vortices that are trying to migrate outwards. On the basis of this model, 
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Fig. 4 | Timing residuals and their cumulative 
sum around the time of the glitch. Residuals are 
shown for the 260s around the time of the glitch t, 
(solid red line). a, Timing residuals (in milliseconds) 
similar to those of Fig. 1, with no glitch modelling 
applied. b, Cumulative sum of the timing residuals 
of a. c, Cumulative sum of timing residuals, after 
glitch modelling has been applied. The events 
observed at times f-t; (see text) are highlighted. 
Inset, magnified view of b showing fo, t1, te 

and ft. The horizontal error bar represents the 

lo uncertainty in the fitting of fy. 


Fig. 5 | Peak flux density around the time of the 
‘null’. The flux density is shown in arbitrary units 
and the ‘null occurs at to (vertical red line). Data 
have been binned into 200-pulse (about 18s) bins. 
The horizontal lines indicate 1o spacings. 
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they predict that a glitch would affect the geometry of the pulsar’s mag- 
netic field. This may be what we have observed in the ‘null pulse (pulse 
77), the strange shape of pulse 76, and the loss of linear polarization in 
pulses 78 and 79. 

We also observed a 3c dip in the peak flux density for about 2 min 
on either side of to (see Fig. 5). Vela is known® to emit bright pulses 
that arrive between 1 ms and 1.5 ms before the main pulse. This 3a dip, 
combined with the reduced variance of the timing residuals, suggests 
that fewer bright pulses were emitted from the magnetosphere in this 
interval. The disruption of the magnetosphere could have caused the 
normal coherent emission process to break down sufficiently to stop 
the emission of bright pulses from the precursor region, where they are 
usually seen. Changes in the particle bunching in the magnetosphere 
could affect coherence, the radio flux density, the beaming direction 
or the emission height. 

Future observations of single pulses associated with glitches in Vela 
may provide confirmation that glitches consistently cause null pulses 
or peculiar-shaped pulses. Observations with larger telescopes (or tel- 
escope arrays) may probe this behaviour more deeply by determining 
whether the ‘null is genuine, which will help us to resolve some of the 
outstanding issues with regard to the internal mechanics and equations 
of state of neutron stars. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0001-x. 
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METHODS 

Using the Mount Pleasant 26-m radio telescope, which is located near Hobart, 
Tasmania, we observed Vela when it was above the lower elevation limit, 4.3°, 
obtaining data for about 19h each day. We also observed Vela with our 30-m tele- 
scope in Ceduna, South Australia. Both telescopes operated at a centre frequency 
of 1,376 MHz and a bandwidth of 64 MHz. Although the Ceduna dish is larger 
than that of the Mount Pleasant 26-m telescope, its receiver is much less sensitive 
because it is not cooled to cryogenic temperatures. Both telescopes have dual lin- 
early polarized receivers. 

We recorded about 14,000h of baseband voltage data from Mount Pleasant in 
both polarizations at a rate of 128 x 10° samples per second. Data from the Ceduna 
telescope were recorded in a buffer and discarded until the glitch occurred. 

The baseband data files from both observatories were coherently de-dispersed, 
detected and integrated into single pulses using DSPSR”. In the time domain, each 
rotation of the pulsar was divided into 8,192 phase intervals (giving a resolution 
of 10.9j1s) and in the frequency domain, the 64-MHz band was divided into 16 
sub-bands. PSRCHIVE” was used for analysis, and polarization calibration was 
performed by using Vela as a polarized reference source, but using 128 frequency 
sub-bands and 1,024 phase intervals!!. 

The glitch epoch (t,) was calculated using the TEMPO2 software” and a 
two-stage iterative process. First, we adjusted t, to minimize the phase (A@). We 
modelled for changes in v and v and set the long-term glitch decay parameters!* 
to Avg=1.29 x 1077 and r4=0.96. Then, we used an iterative process and stopped 
when Ad < 1077 (Ad =6.98 x 10-86 ms). 

After this approximation, we adjusted t, manually to minimize the root-mean- 
square residuals in the arrival time (data minus model). Then, we adjusted v to 
minimize the root-mean-square residuals, and then /. This was repeated several 
times, until convergence was achieved. In each step of this process, the plot of the 
root-mean-square residuals was a parabola smooth enough to validate our best-fit 
determination. 
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The Av /v and Av/v values for the Mount Pleasant observations shown in 
Extended Data Table 1 were obtained using four days of data, whereas the fitting 
of the glitch epoch was based on 72 min of data. The corresponding results for 
Ceduna were based on 15h of data, but with only about 1h of pre-glitch timings 
available; thus, Av/v was not well constrained and Av/v could not be 
determined. 

Data availability. Source Data files containing the data shown in the figures 
are available in the online version of the paper. The raw data were generated at 
the Mount Pleasant and Ceduna radio observatories, which are operated by the 
University of Tasmania, and are available from the corresponding author upon 
reasonable request. 

Code availability. The software DSPSR, TEMPO2 and PSRCHIVE are available 
at http://dspsr.sourceforge.net/, http://www.atnf.csiro.au/research/pulsar/tempo2/ 
and http://psrchive.sourceforge.net/, respectively. 
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Extended Data Table 1 | Arrival times of the 2016 glitch of the Vela pulsar 


Location MJD Time (UTC) ae ae 
Mount Pleasant 57734.484991 11:98:23.2. 1431.24 «10° 92 10-° 
uncertainty +2.9 x 107° +2.5s +0.069x 107° +0.83 x 107° 
Ceduna 57734.484973 WSe20.7. 143352107 N/A 
uncertainty +3.2 x 10~° +2.8°8 N/A N/A 


Arrival times at the Solar System barycentre were estimated on the basis of data recorded at each observatory. The last two columns list the relative change in rotation frequency and the relative 
change in the first derivative of the rotation frequency. Uncertainties are 1a. MJD, modified Julian date; UTC, coordinated universal time. 
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Extended Data Table 2 | Arrival times of key events at the Solar System barycentre 


Event MJD Time At (s) Rotations 
t) null pulse 57734.4849521 11:38:19.9 1.8 90 
t; spin-down starts 57734.4849738 11:38:21.7 15 17 
t, glitch fit 57734.4849906 11:38:23.2 11 12 
tg spin-down ends 97734.4850038 11:38:24.3 43.8 490 
tz spin-up starts 07734.48551 11:39:08.1 


The times t, and to-tz are listed, as shown in Fig. 4. The last two columns list the time difference (At) and number of pulsar rotations between events. MJD, modified Julian date. 
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Experimentally generated randomness certified by 
the impossibility of superluminal signals 


Peter Bierhorst!?*, Emanuel Knill!?, Scott Glancy!, Yanbao Zhang!®, Alan Mink*, Stephen Jordan*, Andrea Rommal’, 
Yi-Kai Liu’, Bradley Christensen’, Sae Woo Nam!, Martin J. Stevens! & Lynden K. Shalm!? 


From dice to modern electronic circuits, there have been many 
attempts to build better devices to generate random numbers. 
Randomness is fundamental to security and cryptographic 
systems and to safeguarding privacy. A key challenge with 
random-number generators is that it is hard to ensure that their 
outputs are unpredictable’-*. For a random-number generator 
based on a physical process, such as a noisy classical system or an 
elementary quantum measurement, a detailed model that describes 
the underlying physics is necessary to assert unpredictability. 
Imperfections in the model compromise the integrity of the device. 
However, it is possible to exploit the phenomenon of quantum non- 
locality with a loophole-free Bell test to build a random-number 
generator that can produce output that is unpredictable to any 
adversary that is limited only by general physical principles, such 
as special relativity!"!!. With recent technological developments, 
it is now possible to carry out such a loophole-free Bell test!?-142, 
Here we present certified randomness obtained from a photonic 
Bell experiment and extract 1,024 random bits that are uniformly 
distributed to within 10~!?. These random bits could not have been 
predicted according to any physical theory that prohibits faster- 
than-light (superluminal) signalling and that allows independent 
measurement choices. To certify and quantify the randomness, 
we describe a protocol that is optimized for devices that are 
characterized by a low per-trial violation of Bell inequalities. Future 
random-number generators based on loophole-free Bell tests may 
have a role in increasing the security and trust of our cryptographic 
systems and infrastructure. 

The search for certifiably unpredictable random-number generators 
is motivated by applications, such as secure communication, for which 
the predictability of pseudorandom strings makes them unsuitable. 
Private randomness is required to initiate and authenticate virtually 
every secure communication), and public randomness from random- 
ness beacons can be used for public certification and resource distri- 
bution in many settings’®. To certify randomness, we can perform 
an experiment known as a Bell test!7; in its simplest form, the Bell 
test involves performing measurements on an entangled system with 
components located in two physically separated measurement stations, 
where at each station a choice is made between one of two types of 
measurement. After multiple experimental trials with varying meas- 
urement choices, if the measurement data violate conditions known 
as ‘Bell inequalities, then the data are certified to contain randomness 
under weak assumptions. 

Our randomness generation uses a ‘loophole-free’ Bell test, which is 
characterized by high detection efficiency and space-like separation of 
the measurement stations during each experimental trial. The bits are 
unpredictable assuming (1) that the choices of measurement setting are 
independent of the experimental devices and of pre-existing classical 
information about them and (2) that, in each experimental trial, the 
measurement outcomes at each station are independent of the settings 


at the other station. The first assumption is ultimately untestable, but 
the premise that it is possible to choose measurement settings inde- 
pendently of a system being measured is often tacitly invoked in the 
interpretation of many scientific experiments and laws of physics!®. 
The second assumption can be violated only if signals can be sent faster 
than the speed of light, given our trust that the space-like separation 
of the relevant events in the experiment is accurately verified by the 
timing electronics and that the results are final when recorded. We also 
trust that the classical computing equipment used to process the data 
operates according to specification. 

Under the above assumptions, the output randomness is certified to 
be unpredictable with respect to a real or hypothetical actor “Eve; who is 
in possession of the pre-existing classical information, is physically iso- 
lated from the devices while they are under our control and is without 
access to data produced during the protocol. The bits remain unpredict- 
able to Eve if she learns the settings at any time after her last interaction 
with the devices. Ifthe devices are trusted, which is reasonable if we built 
them, then this final interaction may be well before the start of the pro- 
tocol, in which case the settings can come from public randomness*!”, 
In particular, an existing public randomness source can be used, 
such as the National Institute of Standards and Technology (NIST) 
random beacon", to generate much-needed private randomness as 
output. Because the assumptions do not constrain the specific physical 
realization of the devices and do not require specific states or meas- 
urements, they implement a ‘device-independent’ framework*!?0, 
which allows an individual user to assure security with minimal 
assumptions about the devices. 

Compared to other implementations of random-number generations 
that invoke device-independence*”!, our implementation is notable 
because it enforces space-like separation between measurement sta- 
tions. Bell tests that achieve space-like separation without other exper- 
imental loopholes have been performed only recently'?"!*””. It can be 
argued that interaction between spatially (if not space-like) separated 
measuring stations can be assumed to be negligible. However, any 
shielding between the stations is necessarily incomplete; for example, 
there must be an open quantum channel to establish entanglement. 
Mundane physical effects, such as accidentally scattered photons, can 
allow predictable systems to appear to violate Bell inequalities when 
shielding is incomplete. Relying instead on the impossibility of faster- 
than-light communication provides stronger assurance of the unpre- 
dictability of the randomness. 

We generated randomness using an improved version of a recently 
reported!? loophole-free Bell test (which was subsequently used 
elsewhere?*). We collected five datasets, with the best-performing 
one yielding 1,024 random bits that are uniformly distributed 
to within 10~!*, as measured by the total variation distance (see 
below). We also obtained 256 random bits from the main data- 
set analysed previously', albeit uniform only to within 0.02; see 
Supplementary Information section 6. The experiment, illustrated in 
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Fig. 1 | Diagram of the experiment. a, b, The relative locations of the 
source (S), Alice (A) and Bob (B) are depicted in a. In each trial, the source 
laboratory produces a pair of photons in a non-maximally polarization- 
entangled state. One photon is sent to Alice's laboratory while the other 

is sent to Bob’s laboratory to be measured, as shown in b. Alice and Bob 
both use a fast Pockels cell (PC), two half-wave plates (HWPs), a quarter- 
wave plate (QWP) and a polarizing beam displacer to switch between 
their respective polarization measurements. A pseudorandom-number 
generator (RNG) governs the choice of each measurement setting for each 
trial. After passing through the polarization optics, the photons are sent 
to a superconducting nanowire detector. The signals from the detector are 
amplified and sent to a time tagger, where their arrival times are recorded 
and the measurement outcome is fixed. Alice’s measurement outcome is 
space-like separated from the triggering of Bob's Pockels cell and 

vice versa. 


B 


Fig. 1, consisted of a source of entangled photons and two measure- 
ment stations, named ‘Alice’ and ‘Bob. During an experimental trial, 
at each station a random choice was made between two measure- 
ment settings, labelled 0 and 1, after which a measurement outcome 
of detection (+) or non-detection (0) was recorded. Each station’s 
implementation of the measurement setting was space-like sep- 
arated from the other station’s measurement event, and no post- 
selection was used in collecting the data; see Methods for details. For 
trial i, we model Alice’s settings choices with the random variable 
X; and Bob’s with Y;, both of which take values in the set {0, 1}. 
Alice’s and Bob’s measurement-outcome random variables are A; 
and B;, respectively, both of which take values in the set {+, O}. 
When referring to a generic single trial, we omit the i indices. With 
this notation, a general Bell inequality for our scenario can be 
expressed in the form”* 


ab 
2 sy PA a,B=b|X=x,Y=y)<6 (1) 


where s oe are fixed real coefficients indexed by a, b, x and y, which range 
over all possible values of A, B, X and Y, and P denotes probability. The 
upper bound (3 is required to be satisfied whenever the settings-conditional 
outcome probabilities are induced by a model that satisfies ‘local real- 
ism. Local-realist distributions, which cannot be certified to contain 
randomness, are those for which P(A =a, B=b X=x, Y=y) is of the 
form }>, P(A=a | X=x, A=)P(B=b | Y=y, A= A)P(A= 4) for 
a random variable A that represents local hidden variables. The Bell 
inequality is non-trivial if there exists a quantum-realizable distribution 
that can violate the bound (3. 

It has long been known that experimental violations of Bell inequalities 
such as equation (1) indicate the presence of randomness in data. To 
quantify randomness with respect to Eve, we represent Eve's initial clas- 
sical information by a random variable E. We formalize the assumption 
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that measurement settings can be generated independently of the system 
being measured and of Eve's information with the following condition: 


P(X;=x, Y,=y |E 


j f e, past ;) = P(X;=x, Y,=y) 


1 1 


1 (2) 
=— Vxy, 
Fi X,y,€ 


where past; represents events prior to the ith trial, specifically including 
the trial settings and outcomes for trials 1 to i— 1. Our other assump- 
tion, that measurement outcomes are independent of remote measure- 
ment choices, is formalized as follows: 


P(A; =a | X;=x, Y,;=y, E=e, past,) 

P(A;=a | X;=x,E=e,past,;) and 3) 
P(B; =b | X;=x, Y;=y, E =e, past,) 

P(B;=b | Y,=y,E=e,past;) Va,b,x,y,e 


These equations are commonly referred to as the ‘non-signalling’ 
assumptions, although they are often stated without the conditionals 
Eand past;. Our space-like separation of settings and remote measure- 
ments provide assurance that the experiment obeys equation (3). If we 
were to assume that the measured systems obey quantum physics, then 
stronger constraints are possible”>*, 

Given equations (2) and (3), our protocol produces random bits 
in two sequential parts. For the first part, ‘entropy production, we 
implement n trials of the Bell test, from which we compute a statistic 
V that is related to a Bell inequality (equation (1)). V quantifies the Bell 
violation and determines whether or not the protocol passes or aborts. 
If the protocol passes, then we certify an amount of randomness in 
the outcome string whether or not Eve has access to the setting string. 
In the second part, ‘extraction, we process the outcome string into a 
shorter string of bits, the distribution of which is close to uniform. 
We used our customized implementation of the Trevisan extractor?” 
derived from the framework of Mauerer, Portmann and Scholz?8 and 
the associated open-source code. We call this the TMPS algorithm; see 
Supplementary Information section 4 for details. 

We applied a new method of certifying the amount of randomness 
in Bell tests. Previous methods for related models with various sets of 
assumptions” *”*" are ineffective in our experimental regime (see 
Supplementary Information section 7), which is characterized by a 
small per-trial violation of Bell inequalities. Other recent works that 
explore ways of effectively certifying randomness from a wider range 
of experimental regimes assume that measured states are independent 
and identically distributed (i.i.d.) or that the regime is asymptotic? "!*?. 
Our method, which does not require these assumptions, builds on 
the prediction-based ratio method for rejecting local realism*’. 
Applying this method to training data (see below), we obtain a 
real-valued Bell function T with arguments A, B, X and Y that satisfies 
T(A, B, X, Y) > 0 with expectation E(T) < 1 for any local-realist dis- 
tribution that satisfies equation (2). From T we determine the maxi- 
mum value 1 +m of E(T) over all distributions that satisfy equations 
(2) and (3), where we require that m > 0. Such a function T induces a 
Bell inequality (equation (1)) with G=4 and ce = T(a,b,x,y). Define 
T;= T(A; Bj, X;, Y;) and V=[]?_, T;; if the experimenter observes a 
value of V larger than 1, this indicates a violation of the Bell inequality 
and the presence of randomness in the data. The randomness is quan- 
tified by the ‘entropy production theorem (see below), which we prove 
in Supplementary Information section 2. We denote all of the settings 
of both stations with XY =X, Y,X2Y2...X,Y,; other sequences such as 
AB and ABXY are similarly interleaved over n trials. 

The entropy production theorem is as follows. Suppose T is a Bell 
function that satisfies the above conditions. Then, in an experiment of 
n trials that obey equations (2) and (3), the following inequality holds 
for all €, € (0, 1) and Vihresh Satisfying 1 < Vp, eg, < [1 + (3/2)m]" Ea: 
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P,(P,(AB | XY) > 6 AND V> Vanes) S € (4) 


where 6= [1+ (1— »[€pMihresh )/(2m)] " and P, denotes the probability 
distribution conditioned on the event E =e, where e is arbitrary. The 
expression P,(AB | XY) denotes the random variable that takes the 
value P,(AB =ab | XY = xy) when ABXY takes the value abxy. 

In words, this theorem says that, with high probability, if V is at least 
as large as Vihreshs then the output AB is unpredictable, in the sense 
that no individual outcome AB = ab occurs with probability higher 
than 6, even given the information XYE=xye. The theorem supports 
a protocol that aborts if V takes a value less than Vinreshs and passes 
otherwise. If the probability of passing were 1, then —log2(6) would 
be a so-called ‘smooth min-entropy®—a quantity that characterizes 
the number of uniformly distributed bits of randomness that are in 
principle available in AB. We show in Supplementary Information sec- 
tion 3 that, for constant Ep; —log2(65) is proportional to the number of 
trials. The number of bits that we can actually extract depends on egy, 
the maximum allowed distance of the final output from uniform. We 
also show in Supplementary Information section 2 that the entropy 
production theorem can be proved even if the settings probabilities 
are not known exactly. 

To extract the available randomness in AB, we use the TMPS algo- 
rithm to obtain an extractor, specifically a function Ext that takes as 
inputs the string AB and a ‘seed’ bit string S of length d, where S is 
uniform and independent of ABXY. Its output is a bit string of length 
t. S can be obtained from d additional instances of the random var- 
iables X;, so equation (2) ensures the independence and uniformity 
conditions on S that are needed. For the output to be within a distance 
€fn Of uniform independently of XY and E, the entropy production 
and extractor parameters must satisfy the constraints given in the 
‘protocol soundness theorem; which we prove in Supplementary 
Information section 5. In the statement of the theorem, the measure 
of distance used is the total variation distance, which is expressed 
by the left-hand side of equation (6), and ‘pass’ is the event that V 
exceeds Vthresh- 

The protocol soundness theorem is as follows. Let 0 < €ext, K <1. 
Suppose that P(pass) > « and that the protocol parameters satisfy 


t+ Alog ,(t) <- log,(6) + log,(«) + Slog ,(€ext) —1l1 (5) 


Then, the output U=Ext(AB, S) of the function obtained by the TMPS 
algorithm satisfies 


= S> |P(U =u, XYSE = xyse | pass) 
u,xyse 
— p'(U = u)P(KYE = xye | pass)P“™“($ = s)| (6) 


&p 
<——~ + 
P(pass) 


ext 


where P"™! denotes the uniform probability distribution. 

The number of seed bits d that are required satisfies d= 
Oflog(#)log(nt/€ext)7]; we provide an explicit bound in Supplementary 
Information section 4. The protocol soundness theorem enables us to 
quantify the uniformity of the randomness that is produced with an 
overall final error parameter of €fn = max(€p/K + €ext ). (This choice 
of error parameter is conservative; see Supplementary Information 
section 5.) For any probability of passing greater than €f,, the total 
variation distance from uniform (conditionally on passing) is at 
most €fin- 

We applied our protocol to five datasets using a set-up based on that 
described previously!’, with improvements described in Methods. Each 
dataset was collected in 5-10 min. Before starting the protocol, we set 
aside the first 5 x 10° trials of each dataset as training data, which we 
used to choose the parameters that are needed by the protocol. With the 
training data removed, the number n of trials used by the protocol was 
between 2.5 x 10’ and 5.5 x 10’ for each dataset. We used the training 
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data to determine a Bell function T with statistically strong violation 
of local realism on the training data according to the prediction-based 
ratio method*’; see Supplementary Information section 3. The func- 
tion T obtained for the fifth dataset, which was the longest in duration 
and produced the most randomness, assigned values between 0.927 
and 1.004 to the 16 different experimental outcomes. We computed 
thresholds vVinresh SO that a sample of n iid. trials from the distribu- 
tion inferred from the training data would have a high probability of 
exceeding Vthresh- 

For the fifth dataset, a sample of n iid. trials from the distribution 
inferred from the training data would have a probability of approxi- 
mately 0.99 of exceeding a threshold of Vihresh = 1.5 x 10°2. Exceeding 
this threshold would allow the extraction of 1,024 bits that are uni- 
formly distributed to within én = 107, using ep = K? = 9.025 x 10-7 
and €ext=5 x 107!4. These values were chosen on the basis of a numer- 
ical study of the constraints on the number f of bits extracted for fixed 
values of €fn = 107. Running the protocol on the remaining 
55,110,210 trials with these parameters, the product []}_, T; exceeded 
Vthreshs and so the protocol passed. Applying the extractor to the result- 
ing output string AB with a seed of length d= 315,844, we extracted 
1,024 bits, certified to be uniform to within 107”, the first ten of which 
are 1110001001. In Fig. 2 we display the extractable bits for alternative 
choices of €, for all five datasets. 

For the dataset that produced 1,024 new near random bits, our pro- 
tocol used 1.10 x 10° uniform bits to choose the settings and 3.16 x 10° 
uniform bits to choose the seed. The strong extractor property”® of the 
TMPS algorithm ensures that the seed bits are still uniform, conditional 
on passing, so they can be recovered at the end of the protocol for use 
elsewhere. This is not the case for the settings-choice bits because the 
probability of passing is less than 1. To reduce the entropy used for the 
settings, our protocol can be modified to use highly biased settings 
choices®, Reducing settings entropy is not a priority if the settings and 
seed bits come from a public source of randomness, in which case the 
output bits can still be certified to be unknown to external observers 
such as Eve and the current protocol is an effective method for private 
randomness generation”. 

For future work, we hope to take advantage of the adaptive capabil- 
ities of the entropy production theorem (Supplementary Information 
section 2) to compensate for experimental drift dynamically during 
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Fig. 2 | Extractable bits as a function of error. The figure shows the 
trade-off between the final error €, and the number of extractable bits t 
for values of Vihresh pre-chosen to yield estimated passing probabilities that 
exceed 95%. These thresholds were met in each case. For all datasets (1-5) 
we set €) = K? = (0.95€gn)? and €ex¢ = 0.05€ fn, a split that was generally 
found to be near-optimal when numerically maximizing t in equation (5) 
for fixed values of €f,. The number of trials for datasets 1-5 were 

n = 24,865,320, nz = 24,809,970, nz = 24,818,959, n= 24,846,822 and 

ns = 55,110,210. 
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run time. In view of advances towards practical quantum comput- 
ing, it is desirable to study the protocol when experimental devices 
may have long-term quantum memories and remain entangled with 
Eve after the protocol has begun. This may require more conservative 
randomness generation. 

With the advent of loophole-free Bell tests, we have demonstrated 
that it is possible to build quantum devices that exploit quantum 
non-locality to remove many of the device-dependent assumptions in 
current technological implementations of random-number generators. 
Generators such as ours provide the best method currently known for 
physically producing randomness, thereby improving the security of a 
wide range of applications. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0019-0. 
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METHODS 


We used polarization-entangled photons generated by a nonlinear crystal pumped 
by a pulsed, picosecond laser at approximately 775 nm in a configuration similar 
to that reported previously", but with several improvements to increase the rate 
of randomness extraction. The repetition rate of the laser was 79.3 MHz and each 
pulse that entered the crystal had a probability of approximately 0.003 of creating 
an entangled photon pair in the state |y) ~ 0.982|HH) + 0.191|VV) at a centre 
wavelength of 1,550 nm. By pumping the crystal with approximately five times as 
much power, and using a 20-mm-long crystal, we were able to increase the per- 
pulse probability of generating a down-conversion event substantially compared 
with the previous configuration!* while maintaining similar overall system effi- 
ciencies. The two entangled photons from each pair were sent separately to one of 
the two measurement stations, which were 187 + 1 m apart. At Alice and Bob, a 
Pockels cell and a polarizer combined to allow the rapid switching of measurement 
bases and the measurement of the polarization state of the incoming photons. 
Alice’s computed optimal polarization measurement angles, relative to a vertical 
polarizer, were a= —3.7° and a! = 23.6°, and Bob’s were b =3.7° and b’ = —23.6°. 
Each Pockels cell operated at a rate of 100 kHz, allowing us to perform 100,000 
trials per second (the driver electronics on the Pockels cells sets this rate). A 10- MHz 
oscillator kept Alice’s and Bob's time-tagger clocks locked. After passing through 
the polarization optics, the photons were each coupled into a single-mode fibre 
and detected using superconducting single-photon nanowire detectors, with Bobs 
detector operating at approximately 90% efficiency and Alice's detector operating 
with approximately 92% efficiency*’. For this experiment, the total symmetric 
system heralding efficiency was 75.5% + 0.5%, which is greater than the 71.5% 
threshold that is required to close the detection loophole for our experimental 
configuration after accounting for unwanted background counts at our detectors 
and slight imperfections in our state-preparation and measurement components. 

With this configuration, Bob completed his measurement 294.4 + 3.7 ns before a 
hypothetical switching signal travelling at light speed from Alice’s Pockels cell could 
arrive at his station. Similarly, Alice completed her measurement 424.2 + 3.7 ns 
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before such a signal from Bob’s Pockels cell could arrive at her location. The out- 
come values for each trial were obtained by aggregating the photon detection or 
non-detection events from several short time intervals, each lasting 1,024 ps and 
timed to correspond to one pulse of the pump laser. If any photons were detected 
in the short intervals, then the outcome was ‘+’; if no photons were detected, then 
the outcome was ‘0’ The previous experiment! used at most 7 short intervals, but 
here we were able to include 14 intervals while maintaining space-like separation, 
which further increased the probability of observing a photon during each trial. 
For demonstration purposes, Alice and Bob each used Python's random.py module 
with the default generator (the Mersenne twister) to pick their settings at each trial. 
This pseudorandom source is predictable, and for secure applications of the 
protocol in an adversarial scenario, such as if the photon pair source or measure- 
ment devices are obtained from an untrusted provider, settings choices must be 
based on random sources that are effectively not predictable. However, from our 
knowledge of device construction, we know that our devices have no physical 
resources for predicting pseudorandom numbers and expect that the measurement 
settings were effectively independent of the relevant devices so that equations (2) 
and (3) still hold. We remark that the settings choices for the previous datasets? 
were based on physical random sources. 

With the improved detection efficiency, the higher per-trial probability for Alice 
and Bob to detect a photon, and a higher signal-to-background counts ratio, we 
are able to improve the magnitude of our Bell violation and to reduce the number 
of trials that are required to achieve a statistically significant violation by an order 
of magnitude. 

Sample size. No statistical methods were used to predetermine sample size. 
Data availability. The photon detection data that support the findings of this study 
are available in the NIST Published Data Repository (https://doi.org/10.18434/ 
T4/1423448). 
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Anomalously weak Labrador Sea convection and 
Atlantic overturning during the past 150 years 


David J. R. Thornalley!?*, Delia W. Oppo, Pablo Ortega’, Jon I. Robson, Chris M. Brierley!, Renee Davis!, Ian R. Hall*, 
Paola Moffa-Sanchez’*, Neil L. Rose!, Peter T. Spooner!, Igor Yashayaev? & Lloyd D. Keigwin? 


The Atlantic meridional overturning circulation (AMOC) is 
a system of ocean currents that has an essential role in Earth’s 
climate, redistributing heat and influencing the carbon cycle’. 
The AMOC has been shown to be weakening in recent years!; 
this decline may reflect decadal-scale variability in convection 
in the Labrador Sea, but short observational datasets preclude 
a longer-term perspective on the modern state and variability of 
Labrador Sea convection and the AMOC!**. Here we provide 
several lines of palaeo-oceanographic evidence that Labrador Sea 
deep convection and the AMOC have been anomalously weak over 
the past 150 years or so (since the end of the Little Ice Age, LIA, 
approximately ap 1850) compared with the preceding 1,500 years. 
Our palaeoclimate reconstructions indicate that the transition 
occurred either as a predominantly abrupt shift towards the end of 
the LIA, or as a more gradual, continued decline over the past 150 
years; this ambiguity probably arises from non-AMOC influences 
on the various proxies or from the different sensitivities of these 
proxies to individual components of the AMOC. We suggest that 
enhanced freshwater fluxes from the Arctic and Nordic seas towards 
the end of the LIA—sourced from melting glaciers and thickened 
sea ice that developed earlier in the L[A—weakened Labrador Sea 
convection and the AMOC. The lack of a subsequent recovery may 
have resulted from hysteresis or from twentieth-century melting of 
the Greenland Ice Sheet®. Our results suggest that recent decadal 
variability in Labrador Sea convection and the AMOC has occurred 
during an atypical, weak background state. Future work should 
aim to constrain the roles of internal climate variability and early 
anthropogenic forcing in the AMOC weakening described here. 
The AMOC comprises northward transport of warm surface and 
thermocline waters, and their deep southward return flow as dense 
waters that formed through cooling processes and sinking at high 
latitudes”. The stability of the AMOC in response to ongoing and pro- 
jected climate change is uncertain. Monitoring of the AMOC during 
the past decade with an instrument array at 26° N has suggested that 
the AMOC is weakening, and that this is occurring ten times faster 
than would be expected from climate model projections!. However, it 
remains uncertain whether this trend is part of a longer-term decline, 
natural multidecadal variability, or a combination of both. Here, we 
develop past reconstructions of AMOC variability that can be compared 
directly with instrumental datasets and provide longer-term perspective. 
The Labrador Sea is an important region for deep-water formation 
in the North Atlantic ocean°. Moreover, modelling studies suggest 
that deep-Labrador-Sea density (DLSD) might be a useful predictor of 
AMOC change**”. This is because density anomalies produced in the 
Labrador Sea—caused predominantly by varying deep convection— 
can propagate southwards rapidly (on timescales of the order of 
months) along the western margin via boundary waves, altering the 
cross-basin zonal density gradient, and thus modifying geostrophic 
transport and therefore AMOC strength?-*”~*. Building upon these 
studies, we show that DLSD anomalies are also associated with changes 


in the velocity of the deep western boundary current (DWBC) and the 
strength of the AMOC at 45° N in the high-resolution climate model 
HadGEM3-GC2 (see Methods and Fig. 1). 

In addition to this link between the AMOC, DLSD and DWBC, 
changes in the AMOC also alter ocean heat transport. Modelling 
studies’? suggest that AMOC weakening affects the upper-ocean heat 
content of the subpolar gyre (SPG) with a lag time of around ten years. 
Moreover, a distinct AMOC fingerprint on subsurface temperatures 
(Tgup; at depths of 400 m)!! characterizes weak AMOC phases, with 
a dipole pattern of warming of the Gulf Stream extension region” 
and cooling of the subpolar Northeast Atlantic. We exploit here the 
model-based covariance of decadal changes in the AMOC with DLSD 
anomalies, SPG upper-ocean heat content and the 7, fingerprint, to 
extend constraints on past AMOC variability (see Methods). Over 
the instrumental era (from ap 1950 or so), these indices suggest sub- 
stantial decadal variability in the AMOC, with coherent changes in 
DLSD, lagged SPG upper-ocean heat content and a lagged Ts, AMOC 
fingerprint??">!"), 

The model results in Fig. 1 suggest that we can use flow-speed 
reconstructions of the DWBC to infer past changes in DLSD and the 
AMOC. We analysed the sortable-silt mean grain size—a proxy for 
near-bottom current flow speed'*—in two marine sediment cores 
(48JPC and 56JPC; see Methods and Extended Data Figs. 1, 2) located 
under the influence of southward-flowing Labrador Sea Water (LSW) 
within the DWBC off Cape Hatteras (hereafter DWBCysw). The high 
sediment-accumulation rates (about 0.5-1cm per year) and the 
modern core-top enable direct comparison of the record from 56JPC 
with observational datasets (Fig. 2). 

In agreement with the model-predicted relationship (Fig. 1), changes 
in the inferred flow speed of the DWBCysw show similar, in-phase, 
variability with observed DLSD®. Moreover, there is strong covariability 
of our DWBCjgsw proxy with the lagged (12-year) SPG upper-ocean 
heat content and T;, index from observational analysis (Fig. 2a). Over 
the past 100 years or so, the spatial correlation of upper-ocean heat 
content anomalies associated with our DWBC;sw proxy has closely 
resembled the T;,, AMOC fingerprint (Fig. 2b, c), supporting the 
concept that the DWBC;sw proxy and upper-ocean temperature 
changes provide complementary, coherent information on a common 
phenomenon, namely, AMOC variability. Combined, these datasets 
indicate that decadal variability has been a dominant feature of the 
past 130 years, with the most recent strengthening of LSW formation 
during the mid-1990s, and its subsequent decline, being particularly 
prominent features. 

To gain insight into variability before the instrumental era, we first 
extended our DWBCjsw flow-speed reconstruction (Fig. 3e). The 
DWBC{sw proxy suggests that the AMOC has been weaker during 
the past 150 years than at any other time during the past 1,600 years. 
The emergence of this weaker state (during which the smoothed 
record exceeds a noise threshold of 20 pre-industrial-era variability) 
takes place at about ap 1880 in both cores. The overall transition 
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Fig. 1 | Modelled link between DWBC velocity, deep Labrador Sea 
density and the AMOC. a, Correlation (colour bar on the right) between 
the vertically averaged ocean density (1,000 m to 2,500 m) and DLSD 
(average ocean density between 1,000 m and 2,500 m in the area defined 
by the green box), as modelled using a control run of the high-resolution 
climate model HadGEM3-GC2. The locations of the sediment-core sites 
used for DWBC flow-speed reconstruction are also shown. b, Climatology 


occurs from about ap 1750 to ap 1900—Iate in the Little Ice Age 
(ap 1350-1850) and during the early stages of the industrial era (1830 
onwards!*). Applying the flow-speed calibration for sortable silt!’ sug- 
gests a decrease from 17cms~! to 14.5cms~!in core 56JPC during this 
transition period, and from 14cms~! to 12cms7! in 48JPC, suggesting 
a decrease in DWBCjsw strength of approximately 15% (assuming a 
constant DWBCysw cross-sectional area). This decrease is equivalent 
to 30 and 4o of the pre-industrial-era variability in 48JPC and 56JPC, 
respectively. 

Second, we compiled quantitative proxy records of subsurface ocean 
temperatures (at depths of about 50-200 m) from key locations to 
extend the T,,) AMOC proxy (Fig. 3a—c; see Methods and Extended 
Data Figs. 3, 4). This Tj,» proxy reconstruction provides support for 
the proposed AMOC weakening. Opposing temperature anomalies 
recorded in the two regions after about ap 1830—with warming of the 
Gulf Stream extension region and cooling of the subpolar Northeast 
Atlantic region—suggest a weaker industrial-era AMOC. Further sup- 
port for the AMOC weakening is suggested by the spatial pattern of Typ 
change in the Northwest Atlantic during the onset of the industrial era 
(Extended Data Fig. 5). In contrast to the prominent changes recorded 
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Fig. 2 | Proxy validation and recent multidecadal variability. a, The 
mean grain size of sortable silt (SS; from sediment core 56JPC; blue) is 
compared with: the central-Labrador-Sea annual density” (black; r? = 0.56; 
n= 54), which is comparable to the model-based DLSD (Extended Data 
Fig. 9); the 12-year lagged SPG upper-ocean heat content (at 0-700 m; 55° 
N to 65° N, 15° W to 60° W; EN4 dataset; red; 17 = 0.58; n = 116); and the 
12-year lagged Ty, AMOC fingerprint’! (brown; dashed line shows the 
zero line; r? = 0.76; n=55). Correlations (and the 2a SS error bar; n = 30) 
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of the modelled meridional ocean velocity (in metres per second; see 
colour bar) at 30° N to 35° N (see Methods and Extended Data Figs. 7, 8), 
illustrating the modelled position of the DWBC (red outline). The y axis 
shows water depth in metres. c, Cross-correlations between the modelled 
average DWBC flow speed from the red box in panel b, and indices of 
DLSD and the AMOC at 45° N (the dashed blue line omits the Ekman 
component). 


in our proxy reconstructions at the end of the LIA, more subdued varia- 
bility occurs during the earlier part of our records (ap 400-1800). This 
might suggest that the forcing and AMOC response were weaker then, 
or that the AMOC did not play a leading role in the (multi)centennial 
climate variability of this period’>'*. 

Labrador Sea deep convection is a major contributor to the AMOC, 
but susceptible to weakening. This fact, combined with its role in 
decadal AMOC variability over the past 100 years or so (Fig. 2) and 
model analysis of mechanisms for AMOC variability in operation 
today®, makes it likely that changes in Labrador Sea convection were 
involved in the weakening of the AMOC at the end of the LIA. Further 
correlative (although not necessarily causative) support for this idea 
is revealed by palaeo-oceanographic evidence from the Labrador Sea. 
Strong deep convection in the Labrador Sea is typically associated 
with cooling and freshening of the subsurface ocean®. Therefore, the 
reconstructed shift to warmer and saltier subsurface conditions in the 
northeast Labrador Sea!” over the past 150 years (Fig. 3d; equivalent 
to around 20 of pre-industrial-era variability) is consistent with a shift 
to a state characterized by reduced deep convection, with only occa- 
sional episodes of sustained deep convection. Reconstructions of the 
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are for three-point means (thicker lines). Low-resolution 48JPC data are 

not shown. b, 10- and 12-year lagged spatial correlation (colour bar; R) 

of upper-ocean heat content (at 0-700 m) with reconstructed DWBC;sw 

flow speed (from sediment core 56JPC); the heat content lags behind 

the DWBC. Grey contours show the spatial T;,, AMOC proxy"; green 

triangles show T;,, proxy sites; the green circle marks the surface region 

controlling benthic temperatures at site 7; grey circles are DWBC sites; the 

grey star marks the core site from ref. '”. 
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other major deep-water contributors to the AMOC—the two Nordic 
Seas overflows—suggest that, on centennial timescales, they have 
varied in anti-phase and probably therefore compensated for one 
another during the past 3,000 years'®. Hence, changes in Labrador Sea 
deep convection may have been the main cause of AMOC variability 
over this period. 

Although atmospheric circulation has played a dominant part in recent 
decadal variability in the AMOC (and LSW)%, there is no strong evi- 
dence that the AMOC decrease at the end of the LIA was similarly caused 
by a shift in atmospheric circulation’’. Instead, we hypothesize that the 
AMOC weakening was caused by enhanced freshwater fluxes associated 
with the melting and export of ice and freshwater from the Arctic and 
Nordic seas. During the LIA, circum-Arctic glaciers and multiyear Arctic 
and Nordic sea ice were at their most advanced state of the past few thou- 
sand years, and there were large ice shelves in the Canadian Arctic and 
exceptionally thick multiyear sea ice. Yet, by the early twentieth century, 
many of these features had disappeared or were retreating””*?. 

Modelling studies suggest that enhanced freshwater fluxes of about 
10-100 mSv over a few decades can weaken Labrador Sea convection 
and the AMOC™,, although models with strong hysteresis of Labrador 
Sea convection”? suggest that this weakening may be caused by as little 
as 5-10 mSv of freshwater. Unfortunately, there are few data to constrain 
the Arctic and Nordic Sea freshwater fluxes associated with the end of the 
LIA. The earliest observational datasets**”” suggest that a flux of about 
10 mSv resulted from sea-ice loss in the Arctic and Nordic seas during 
1895-1920, to which we must also add melting of previously expanded 
circum-Arctic glaciers and ice shelves, and enhanced melting of the 
Greenland Ice Sheet. Alternatively, we could estimate that a 1-m reduc- 
tion in average Arctic sea-ice thickness during the termination of the LIA 
could have yielded a freshwater flux of 10 mSv for 50 years. Although 
further work is required to improve this incomplete estimate, there 
was probably sufficient freshwater stored in the Arctic and Nordic seas 
during the LIA to influence Labrador Sea convection and the AMOC. 

The AMOC weakening recorded in our two marine reconstructions 
is broadly similar to that in a predominantly terrestrial-based AMOC 
proxy reconstruction® (Fig. 3c). Our Typ AMOC proxy and that of ref. © 
(Fig. 3c) both suggest a substantial decline in the AMOC through 
the twentieth century, whereas our DWBC;sw AMOC proxy and the 
observational-based T,,» AMOC index (Fig. 2a and Extended Data 
Fig. 6) suggest relatively little long-term AMOC decline during this 
period. These differences may be attributed to several factors. First, 
our sediment-core-based Tj,b proxy is subject to artificial smoothing, 
caused by combining numerous records with substantial (around 
10-100-year) individual age uncertainties, and compounded by sedi- 
ment mixing by organisms (bioturbation). Furthermore, the T;,) proxy 
sediment cores were retrieved in the late 1990s and early 2000s, and 
so cannot capture the strong Ty,» index recovery from around 2000 to 
2010 that reverses the earlier prolonged decline (Extended Data Fig. 6). 
Alternatively, the earlier, more threshold-like change in the DWBC;sw 
AMOC proxy may be due to local shifts in the position of the DWBC 
and/or nonlinear dynamics of the DWBC response to AMOC change. 
However, given the similarity of the DWBC;sw reconstructions from 
cores 56JPC and 48JPC (located at different water depths), and the 
strong correlation of DWBC;sw with Labrador Sea density and the Tyub 
AMOC index over the instrumental period, we suggest that these factors 
are not substantial. Finally, the differences between the AMOC recon- 
structions may reflect their varying response timescales and sensitivities 
to the different individual components of the AMOC and the SPG”®”?, 

Our study raises several issues regarding the modelling of the AMOC 
in historical experiments. The inferred transition to a weakened AMOC 
occurred near the onset of the industrial era, several decades before 
the strongest global warming trend, and has remained weak up to the 
present day. This suggests either hysteresis of the AMOC in response 
to an early climate forcing—natural (solar, volcanic) or anthropogenic 
(greenhouse gases, aerosols, land-use change)—or that continued 
climate forcing, such as the melting of the Greenland Ice Sheet®, has 
been sufficient to keep the AMOC weak or cause further weakening. 


LETTER 


Year 
400 600 800 1000 1200 1400 1600 1800 2 2000 @ 
Northwest Atlantic shelf | mo 
| a 
~3 
ie! 
a & 
22 
5 
ao 
» 
5.3 
a3 
Db 
= 
on 
8 4 Northeast SPG L 
SB. 34 
eS 4 
os | 
so 21 | Ih | 
ef fF jaa ‘| 
28 14 [he | fin. ili NT NA | 
eae NOP, Allan | a vy * DALY il\ tM | 
2% of V | Fl i hy ait veces Cold 
e~ OF ini i il | ‘i 
oO 7 | | ya 
r 44 i 
A my 
3 
o 
in 
Go 
35 b 
ie) n 
5 U 
oe ° 
Hn 3 Stronger AMOC * 
y z 
Zz 
% T sup AMOC proxy 3 
2 Weaker AMOC + 
_d | r 
eae = 
1 Cold, 5 
4 8+ 34.64 fresh 3 
5.2 134.74 
434.84 
5.65 
134.94 
67 e 
735.04 55 
1 E32 
6.47) 35.14 aap 
= €) 
Ise go 
gar 
= Eo 
Preindustrial a a 
= 31 E29 8 
Industrial E = 
naustrial era - 
Weaker 30528 — 
DACP MCA LIA E 
I gE 
400 600 800 1000 1200 1400 1600 1800 2000 
Year 


Fig. 3 | Proxy reconstructions of AMOC changes over the past 1,600 
years. a, b, Subsurface Northwest Atlantic shelf (a) and Northeast 
Atlantic SPG (b) temperatures, taken at sites shown in Fig. 2b. Composite 
stacks are in black. c, Black and grey, our Ty,» AMOC proxy with 
different types of binning (see Extended Data Fig. 4). Orange: AMOC 
proxy from Rahmstorf et al.°; 1 °C = 2.3 Sy; thin line, 21-year smoothing; 
thick line and symbols, binned as for our T,y» AMOC proxy. NE SPG, 
Northeast Atlantic subpolar gyre; NW shelf, Northwest Atlantic shelf; 
NH, Northern Hemisphere; sub, subsurface; surf, surface. d, Subsurface 
(around 100-200 m) temperature and salinity of the northeast Labrador 
Sea, based on Mg/Ca-6'8O analysis of the planktic foraminifera 
Neogloboquadrina pachyderma’. e, Sortable silt (SS) mean grain size. 
Blue, core 56JPC; purple, 48JPC; bold, three-point means; dashed lines, 
industrial/pre-industrial era averages; error bars/shading, +2 s.e. DACP, 
Dark Ages Cold Period (around ap 400-800); MCA, Mediaeval Climate 
Anomaly (around ap 900-1250). 
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Our reconstructions also differ from most climate model simulations, 
which show either negligible AMOC change or a later, more gradual 
reduction*’. Many factors may be responsible for this model-data dis- 
crepancy: a misrepresentation of AMOC-related processes and possible 
hysteresis, including underestimation of AMOC sensitivity to climate 
(freshwater) forcing”®?!; the underestimation or absence of important 
freshwater fluxes during the end of the LIA; and the lack of transient 
forced behaviour in the ‘constant forcing’ pre-industrial controls used 
to initialize historical forcings. Resolving these issues will be important 
for improving the accuracy of projected changes in the AMOC. 

In conclusion, our study reveals an anomalously weak AMOC over 
the past 150 years or so. Because of its role in heat transport, it is often 
assumed that AMOC weakening cools the Northern Hemisphere. 
However, our study demonstrates that changes in the AMOC are not 
always synchronous with temperature changes. That AMOC weak- 
ening occurred during the late LIA and onset of the industrial era, 
rather than earlier in the LIA, may point to additional forcing factors 
at this time, such as an increase in the export of thickened Arctic and 
Nordic sea ice, or the melting of circum-Arctic ice shelves. The per- 
sistence of a weak AMOC during the twentieth century, when there 
was pronounced Northern Hemisphere and global warming, suggests 
that other climate forcings—such as greenhouse gas warming—were 
dominant during this period. We therefore infer that the AMOC has 
responded to recent centennial-scale climate change, rather than driven 
it. Regardless, the weak state of the AMOC over the past 150 years may 
have modified northward ocean heat transport, as well as atmospheric 
warming by altering ocean—atmosphere heat transfer*”3, underscoring 
the need for continued investigation of the role of the AMOC in climate 
change. Determining the future behaviour of the AMOC will depend in 
part on constraining its sensitivity and possible hysteresis to freshwater 
input, for which improved historical estimates of these fluxes during the 
AMOC weakening reported here will be especially useful. 
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METHODS 

Climate model investigation of AMOC and DWBC changes. The climate model 
used here was the UK Met Office’s Global Coupled model 2.0 (HadGEM3-GC2). 
The ocean model for HadGEM3-GC2 is Global Ocean version 5.0, which is based 
on version 3.4 of the Nucleus for European Models of the Ocean model (NEMO)*. 
The ocean model has 75 vertical levels and is run at a nominal 1/4° resolution 
using the NEMO tri-polar grid. The atmospheric component is Global Atmosphere 
version 6.0 of the UK Met Office Unified Model, and is run at N216 resolution 
(around 60 km in mid-latitudes), with 85 vertical levels. More information about 
the model can be found in ref. 3°. The experiment analysed here was a 310-year 
control simulation of HadGEM3-GC2—that is, it includes no changes in external 
forcings. This experiment was previously run and analysed in ref. °, where details 
of the specific model experiment are included. This coupled simulation has a 
relatively high spatial resolution for a more accurate representation of the boundary 
currents, and is sufficiently long to resolve a large number of decadal oscillations. 
All model data have been linearly detrended to remove any potential drift, and 
smoothed with a 10-year running mean in order to focus on the decadal and 
multidecadal variability. 

We use the model-based relationships to support our interpretation of the 
proxy-based AMOC reconstructions, which cannot be validated with the limited 
observations available. We chose the AMOC at 45° N because this is the latitude 
with the largest correlations with both the DLSD and the DWBC velocity index in 
the model. AMOC indices defined at other latitudes (for example, 35° N or 40° N) 
produce weaker, but still substantial, correlations with both DLSD and the DWBC. 
The simulated DWBC velocity index is the average of that at 30° N to 35° N, 
because at 35° N (where the sediment cores were taken) the DWBC is found 
offshore, which we believe is associated with the model’s Gulf Stream separating 
further north than in the observations (Extended Data Fig. 7). We note, however, 
that changes in the position of the observed Gulf Stream do not appear to directly 
control the reconstructed flow-speed changes in the DWBC;sw (Extended 
Data Fig. 10). 

We have also assessed the robustness of the model-based relationships to the 
smoothing. For example, we reproduced the cross-correlation analysis in Fig. 1c 
using undetrended and/or unsmoothed data instead. In all cases, the lead-lag 
relationships are similar, with larger correlations emerging when the decadal 
smoothing is applied. Furthermore, we also tested the sensitivity of the model- 
based relationships to the specific model used. In particular, we repeated the analy- 
sis of Fig. 1 in the 340-year control experiment using the HiGEM climate model”. 
HiGEM has a similar horizontal ocean resolution (1/3°), but is based on a diffe- 
rent ocean model. Encouragingly, Extended Data Fig. 8 shows that the results are 
consistent across the two models, in particular the link between DLSD and the 
DWBC, and between the DWBC and the AMOC at 45° N. However, there are some 
caveats. For example, both models’ Gulf Streams separate too far north, which led 
us to define the DWBC flow indices slightly south of the core sites. HiGEM also 
has a deeper DWBC than that of HadGEM3-GC2. Therefore, the DWBC index was 
computed at different levels in both models to represent the link between DLSD 
and the DWBCs. However, despite these differences, both models support the 
general interpretation that the DWBC in the vicinity of Cape Hatteras is strongly 
connected with changes in the DLSD and the AMOC. 

The interpretation of the model results is consistent with previously published 
model studies (both low and high resolution) that have revealed a coupling between 
the AMOC and/or Labrador Sea density, and the DWBC*”!!*”, These modelled 
relationships support a causal link for the correlations between the instrumental 
records of Labrador Sea density and the reconstructed DWBC velocity, presented 
in Fig. 2. Furthermore, recent instrumental data for the DWBC at 39° N from 2004 
to 2014 reveal that a reduction in the velocity of classical LSW within the DWBC is 
also accompanied by a decrease in its density**, as hypothesized here. The observed 
decrease in the velocity and density of classical LSW within the DWBC between 
2004 and 2014 is also consistent with the decrease in the DLSD over this period 
(Fig. 2a and Extended Data Fig. 9), although a longer observational DWBC time 
series is needed to gain confidence in this relationship. 

Age models. New and updated age models for the cores are presented in Extended 
Data Figs. 1 and 2, and are based on C, *!°Pb and spheroidal carbonaceous par- 
ticle (SCP) concentration profiles*’. 

Sortable silt data. We used two marine sediment cores for DWBC flow-speed 
reconstruction: KNR-178-56JPC (at 35° 28’ N, 74° 43’ W, 1,718 m water depth) 
and KNR-178-48JPC (35° 46’ N, 74° 27’ W, 2,009 m water depth). Sediments were 
processed using established methods”, taking 1-cm-wide samples at every 1 cm for 
the top 63 cm and then every 4cm down to 200cm in 56JPC, and every 1cm down 
to 71cm in 48JPC. Samples were analysed at Cardiff University on a Beckman 
Coulter Multisizer 4 using the Enhanced Performance Multisizer 4 beaker and 
stirrer setting 30 to ensure full sediment suspension. Two or three separate 
aliquots were analysed for each sample, sizing 70,000 particles per aliquot. 
Analytical precision was approximately 1% (0.3 |1m), while full procedural error 
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(based on replicates of about 25% of samples, starting from newly sampled bulk 
sediment) was £0.8 jim. 

Temperature data and constructing the T,,, index. Numerous studies have sug- 
gested that AMOC variability is associated with a distinct surface or subsurface 
(400 m) temperature fingerprint in the North Atlantic®!!81, However, the lack 
of long-term observations of the AMOC prevents accurate diagnosis of the precise 
AMOC temperature fingerprint, and models display a range of different AMOC 
temperature fingerprints”. Here we focus on the Ty.) AMOC fingerprint, pro- 
posed by Zhang"! on the basis of covariance between a modelled AMOC, the 
spatial pattern of the leading mode of subsurface (400 m) temperature variability, 
and sea-surface height changes. These model-based relationships are supported by 
similar relationships (spatial and temporal) observed in recent instrumental data 
of subsurface temperature and sea-surface height. The agreement between our 
DWBCysw AMOC reconstruction, observed Labrador Sea density changes, and 
the Tsu» AMOC fingerprint provides support for our approach and suggests that 
the Ty,h AMOC fingerprint is capturing an important component of deep AMOC 
variability. Differences between the various proposed AMOC temperature finger- 
prints probably reflects their sensitivity to different aspects of the AMOC and heat 
transport in the North Atlantic (for example, the AMOC versus SPG circulation”); 
the temperature response to each of these components may be resolved if more 
comprehensive spatial networks of past North Atlantic temperature variability 
are generated’, 

We selected records used in the OCEAN 2K synthesis“ from the Northwest 
Atlantic slope and the subpolar Northeast Atlantic, and supplemented them with 
additional records that also record past temperature variability in the subsurface 
ocean of the chosen region. We excluded cores that did not have a modern core- 
top age (AD 1950 or younger) or a resolution of better than 100 years. We selected 
foraminiferal-based temperature proxies because they record subsurface tempera- 
tures (typically at 50-200 m depth), upon which the T,,b proxy is based. We avoided 
other temperature proxies (for example, alkenones, coccolithores and diatoms) that 
are typically more sensitive to sea-surface temperature, rather than to Tyup, and 
which also use the fine fraction that—at the drift sites required for the necessary 
age resolution—contains substantial allochthonous material, compromising the 
fidelity of in situ temperature reconstruction*™, 

We normalized all T;,, records to the interval aD 1750-2000 (the length of the 

shortest records). We calculated the T;,) proxy reconstruction as the difference 
between the stacked temperature records of the Northwest and Northeast Atlantic. 
Our results are insensitive to the precise binning or stacking method (Extended 
Data Fig. 4). The sedimentation rates of the cores used, combined with the effects 
of bioturbation, mean we cannot resolve signals on timescales shorter than about 
20-50 years. Age model uncertainty is estimated to be up to about 30 years for the 
past 150 years or so (where cores can be dated on the basis of 7!°Pb signatures), and 
around 100 years for AD 400-1800 (where !C dating is relied upon). Therefore, 
the optimal bin intervals chosen were 50 years for AD 1800-2000, and 100 years 
for AD 400-1800. Results using just 50-year and 100-year bins, as well as 30-year 
bins for the top 200 years, are shown in Extended Data Fig. 4. 
Data availability. The proxy data that support these findings are provided as 
Source Data for Figs. 2 and 3 and Extended Data Figs. 1, 2, 4, 5, 6 and 9, and at 
National Geophysical Data Center (NGDC) Paleoclimatology database (https:// 
www.ncdc.noaa.gov/data-access/paleoclimatology-data/datasets). Model data are 
available from J.I.R. (j.irobson@reading.ac.uk) upon reasonable request. 
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Extended Data Fig. 1 | Age model for core KNR-178-56JPC. a, '*C and 
210Pb dating. The 4C ages (with 1o ranges; grey, rejected dates) from 
planktic foraminifera yield a modern core-top age and indicate an average 
sedimentation rate over the past 1,000 years of 320 cm kyr! (dashed 
line). The presence throughout the core of abundant lithogenic grains in 
the >150-j1m fraction—along with the coarse sortable-silt mean grain 
size values—suggests that some reworking of foraminifera has probably 
occurred, resulting in average '4C ages that may be slightly (around 

50 years) older than their final depositional age, consistent with the fact 
that the !°Pb dates do not splice smoothly into the “C ages (the *C ages 
appear slightly too old). The final age model was therefore based on the 
210Db ages for the past century, and was then simply extrapolated back in 
time using the linear sedimentation rate of 320cm kyr '. Given that none 
of our findings depend on close age control in the older section of this 
core (that is, before AD 1880), this uncertainty (with converted !*C ages 
being about 50 years older than the extrapolated linear age model) does 
not affect our conclusions. b, Left, the age model for the top 80 cm of core 
56JPC is based on 7!°Pb dating of bulk sediment, using the constant initial 
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concentration (CIC) method (rejecting the date at 47 cm, which probably 
indicates a burrow). A simple two-segment linear fit to the 7!°Pb dates is 
adopted (rather than point-to-point interpolation or a spline) because 
sedimentological evidence—an abrupt increase in the percentage of coarse 
fraction at 23 cm depth, not observed elsewhere in the core—is indicative 
of a step change in the sedimentation rate. Horizontal dashed lines denote 
the depths of the segments at which the sedimentation rate is inferred 

to change. Centre, further support for the age model of 56JPC over the 
past century comes from the down-core abundance profile of spheroidal 
carbonaceous particles (SCPs, derived from high-temperature fossil 

fuel combustion, counted as described*’), which ramped up from 

the mid to late 1800s and peaked in the 1950s to 1970s (40 cm to 25 cm) 
before declining over recent decades, consistent with the *!°Pb-based age 
model. Right, the occurrence of '*’Cs in the top 40 cm or so of the core 

is also consistent with the 7!°Pb-based age of around 1950 at 40 cm. 

The age uncertainty (10) for the past 60 years of the core is estimated 

at 2-3 years. We note that the sediment core top is at 3 cm depth in the 
core-liner. 
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Extended Data Fig. 2 | Age models for additional cores. a, '‘C-based 

age model, derived from linear interpolation of '*C-dated planktic 
foraminifera (with 1o ranges) in sediment core KNR-178-48JPC (used for 
the DWBC{sw sortable-silt reconstruction), yielding a modern core-top 
age and an average sedimentation rate of around 50cm kyr!. We note 

that the core top is at 3cm depth in the core-liner. The inset shows the SCP 
profile for 48JPC on the basis of the '*C age model, confirming the modern 
age of the top sediments, with SCPs showing the expected profile— 
increasing in concentration from the late 1800s onwards, peaking at 
around 1950 to 1970, and declining afterwards. b, Updated age model for 
core KNR-158-10MC (after ref. *”; used in Extended Data Fig. 5 examining 
regional near-surface temperature trends in the Northwest Atlantic during 
the industrial era), using new 7!°Pb dating (CIC method) for the top 7cm 
and rejecting the anomalously old 4C age at 4cm depth; the inset shows 
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2)0Pb age constraints in the top 8cm. A single detectable occurrence of 
187Cs at 2-2.5 cm (equivalent to 1957 on the 7!°Pb-based age model) can 
be linked to the bomb peak at 1963, supporting the age model. Also, SCPs 
were found in the top 5cm of this core, confirming the industrial-era 

age for the top 5 cm; however, the low concentrations of SCPs prevent 
meaningful interpretation of the down-core trends and are not shown. 

c, Age model for core OCE-326-MC29B (used for Tb reconstruction of 
the Northwest Atlantic shelf): !C ages of planktic foraminifera (with lo 
ranges), from ref. “8. Support for this age model is provided by the SCP 
concentrations (inset; this study), which show the expected down-core 
profile*® when plotted using the '4C ages. ?!°Pb dating*® also suggests a 
sedimentation rate of around 120cm kyr™! for the uppermost sediments, 
consistent with the '*C ages and SCP profile. 
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Extended Data Fig. 3 | Raw data for construction of the T,y, AMOC Water (ENACW), largely composed of waters formed in the eastern 
proxy shown in Fig. 3. Locations are shown in Fig. 2b. a-c, Temperature SPG*>**. h, The high-resolution alkenone sea-surface temperature (SST) 
proxy records*®-°° used for the Northwest Atlantic stack (Emerald Basin, record from the North Iceland shelf?” was not included because it is not 
Laurentian Fan and Gulf of St Lawrence), where model studies!!!” located within the open North Atlantic SPG (although it does also show, 
indicate that AMOC weakening results in warming of surface and like the other Northeast Atlantic records, that the lowest temperature of 
subsurface waters. d-g, Records used to reconstruct Northeast Atlantic the past 1,600 years occurred during the most recent century). Also shown 
SPG subsurface temperatures: d, Gardar drift*!; e, combined South Iceland _ for reference is the Rahmstorf central SPG SST reconstruction (based 
data (Bjorn drift)**»°; f, Feni drift®4; g, Eastern North Atlantic Central largely on terrestrial proxies)®. 
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solid lines and squares represent preferred binning (50 years for 1800-2000; _ time. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


@ (AMOC -ve) - (AMOC +ve), Threshold +/- 3 Sv 


60W 


50W 


3.2 -24 -16 -0.8 0 0.8 1.6 2.4 3.2 
SST difference (°C) 


(oy 


0 400 800 1200 


a a a a 


1600 


Ly Rit i Gea OV | 


2000 


& MC29 

Oo 

iS 

oO 

g 

<< 

LS) 

Ls} 

Q 

= 

x 
x 
> 
xo} 
® 
Sg 
= 
Q 
@ 
3 
» 
& 

a 

Oo 

S 

®D 

gz 

<= 

1S) 

Ls} 

Q 

= 

x 
x 
= 
S 
ict) 
i?) 
= 
Q 
Q 
3 
ict) 
& 

0 400 800 1200 1600 2000 
Age AD 


LETTER 


Extended Data Fig. 5 | SST response of the Northwest Atlantic to 
AMOC weakening. a, Modelled SST difference between a weak (negative) 
and strong (positive) AMOC**. This pattern is model-dependent, with the 
study cited here** chosen because of its good agreement with observations 
of Gulf Stream variability. The locations of cores used for panel b are 
shown by black stars. b, Percentage abundances of the polar species 

N. pachyderma (sinistral) in marine sediment cores from the Northwest 
Atlantic, as an indicator of near-surface (around 75 m) temperatures. 

A 15% increase indicates around 1 °C of cooling (we note the reversed 

y axes). The opposing trends over the past 200 years are consistent with 
the SST pattern modelled for a weakening of the AMOC, as shown in 
panel a. Data and age models for the cores are: OCE326-MC29*8 using the 
original '*C dating and as shown in Extended Data Fig. 2; OCE326-MC13 
and OCE326-MC25* using the original 4C age ties at the top and 
bottom of the core and scaling the intervening sedimentation rate to the 
percentage of CaCO; content*?*°; KNR158-MC10, this study, using the 
age model in Extended Data Fig. 2. 
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Extended Data Fig. 6 | Temperature fingerprints of the AMOC during 
the twentieth century. a, Top, Tsu, AMOC fingerprint’! obtained using 
empirical orthogonal function (EOF) analysis of the EN4 dataset (light 
green, the leading mode (EOF1) of Tyyp variability from 1993-2003, as 
defined by Zhang"’, applied to the EN4 data; dark green, the second 
mode of Ty, variability (EOF2) of the North Atlantic for 1900-2015, 
equivalent to the EOF1 defined for 1993-2003). No substantial twentieth- 
century AMOC decline is seen in this observation-based reconstruction. 
Bottom, instrument-based reanalysis of the ‘cold blob’ central SPG 

region (red; 3-year (thin line) and 11-year (thick line) smoothing; 47° N 
to 57° N, 30° W to 45° W) used in the Rahmstorf SST AMOC proxy®. 

The data are from the HadISST project. The reconstructed central SPG 
SST bears some resemblance to the T,,» AMOC fingerprint record, 

which is not unexpected given that the central SPG forms a substantial 
spatial component of the T,,, fingerprint. No clear decrease is shown 

in the central SPG SST, and the equivalent Rahmstorf AMOC proxy® 
(blue; central SPG minus the Northern Hemisphere (NH) temperature) 
declines during the twentieth century because of the subtraction of the 
NH warming trend. b, Reconstructed (predominantly terrestrial-based) 
AMOC proxy (orange; the temperature difference between the central 
SPG and the NH) and the central SPG SST reconstruction® (blue). 

There is a two-step decline in the AMOC proxy, at 1850-1900 and 
1950-2000—the former being mainly the result of a strong cooling of the 
SPG (which probably weakened northward heat transport, paralleling the 
weakening shown by our DWBC proxy), and the latter being due mainly to 
subtraction of the strong NH warming trend, rather than a persistent SPG 
cooling. 
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Extended Data Fig. 7 | DWBC changes in model HadGEM3-GC2. speed (shaded, m s~') obtained from the control simulation with 
a, b, Climatological surface current direction (in arrows) and HadGEM3-GC2 and the satellite product OSCAR. 
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vertically averaged ocean density (at 1,000-2,500 m) with the DLSD index _ indices of DLSD and AMOC at 45° N (the dashed line omits the Ekman 
(as defined in ref. 4; green box, 1,000-2,500 m average) in a 340-year component). We note that the box over which the DWBC flow index in 
present-day control run of the HiGEM model (see ref. *°). b, Climatology panel c is averaged has changed with respect to Fig. 1, in order to take into 
of the modelled meridional ocean velocity (in m s~') averaged between account of the fact that the return flow is deeper in the HiGEM model than 
30° N and 35°N, illustrating the modelled position of the DWBC. The in HadGEM3-GC2. 
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Extended Data Fig. 9 | Comparison of Labrador Sea density parameters. 
The model-based DLSD parameter—proposed in ref. 4 and using the EN4 
reanalysis dataset—incorporates a larger area and greater depth range than 
do instrumental-data-only studies, such as ref. >, which examines past 
variability in Labrador Sea convection and focuses on the central Labrador 
Sea and on depths less than 2,000 m, where most observational data are 
available. The comparison here of DLSD (purple line, three-year mean) 
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from the EN4 dataset with instrumental data on density changes in the 
central Labrador Sea at 1,500-1,900 m depth (grey line, annual averages; 
black line, three-year mean) illustrates that the two parameters show very 
similar variability. Both are dominated by the density changes caused by 
deep convection in the Labrador Sea, which can reach down to around 
2,000 m. Estimates of uncertainty are discussed in ref. *. 
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Extended Data Fig. 10 | Comparison with Gulf Stream Index (GSI). three-point smoothed). There is no clear correlation between these two 
A direct influence of the changing position of the Gulf Stream on the proxies (bottom). However, there is a coupling between our SS data (which 
grain size of our core sites can be ruled out by comparing instrumental represent inferred DWBC,sw flow speed) and density changes in the deep 
records of the Gulf Stream position (red, GSI°*) with the down-core Labrador Sea (grey, annual; black, three-point smoothed; top panel). The 
sortable-silt (SS) mean grain size data in 56JPC (blue; thicker line is 20 SS error bar (n = 30) is for the three-point mean. 
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Accelerated increase in plant species richness on 
mountain summits is linked to warming 
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Globally accelerating trends in societal development and human 
environmental impacts since the mid-twentieth century!~’ are 
known as the Great Acceleration and have been discussed as a key 
indicator of the onset of the Anthropocene epoch®. While reports 
on ecological responses (for example, changes in species range or 
local extinctions) to the Great Acceleration are multiplying®”’, it is 
unknown whether such biotic responses are undergoing a similar 
acceleration over time. This knowledge gap stems from the limited 
availability of time series data on biodiversity changes across 
large temporal and geographical extents. Here we use a dataset of 
repeated plant surveys from 302 mountain summits across Europe, 
spanning 145 years of observation, to assess the temporal trajectory 
of mountain biodiversity changes as a globally coherent imprint of 
the Anthropocene. We find a continent-wide acceleration in the rate 
of increase in plant species richness, with five times as much species 
enrichment between 2007 and 2016 as fifty years ago, between 
1957 and 1966. This acceleration is strikingly synchronized with 
accelerated global warming and is not linked to alternative global 
change drivers. The accelerating increases in species richness on 
mountain summits across this broad spatial extent demonstrate that 
acceleration in climate-induced biotic change is occurring even in 
remote places on Earth, with potentially far-ranging consequences 
not only for biodiversity, but also for ecosystem functioning and 
services. 

Mountains are particularly sensitive to ecological change and 
are experiencing some of the highest rates of warming under 
anthropogenic climate change!®!!. Numerous reports of species re- 
distribution towards summits®*!*-'4 and warming-induced changes in 
biodiversity on summits'*!>'® suggest that mountain biota are highly 
sensitive to increasing temperatures!’. The current accelerating trend 
in temperature increase’® should therefore also affect the velocity of 


changes observed for mountain biota. Appropriate empirical assess- 
ments of the rate of change in the velocity of ecological responses 
(biodiversity and ecosystem trajectories) to accelerated global warm- 
ing require long-term resurveys (for example, time series) of species 
communities, but these are scarce and localized. Mountain summits 
are especially suited for long-term studies of biotic responses to envi- 
ronmental changes because they represent natural permanent study 
sites that are easy to re-locate over time!*!°, thus making it possible 
to record reliable time series. By repeatedly resurveying alpine plant 
communities on 302 European mountain summits dating back as 
far as 1871, we generated time series for century-scale and conti- 
nent-wide biodiversity dynamics to assess potential acceleration 
trends in plant diversity dynamics (Fig. 1). Using these time series 
data, we tested whether the recent acceleration of climate change is 
driving a similarly accelerating change in species richness on moun- 
tain summits across the continent. 

We found that plant species richness has increased strongly over 
the past 145 years on the vast majority (87%) of Europe’s summits 
(generalized linear mixed effects model, P < 0.001; Fig. 2, Extended 
Data Fig. 1, Extended Data Table 1) and that the increase has accel- 
erated in the most recent years. This trend is consistent across all 
nine covered geographical regions, with no single region showing 
the opposite pattern. Across all summits, the increase in plant spe- 
cies richness has accelerated over time (linear mixed effects mod- 
els, P< 0.001; Fig. 3, Extended Data Table 2), and the acceleration 
has been particularly pronounced during the past 20-30 years 
(Figs. 2, 3). Fifty years ago (1957 to 1966) the rate of increase in 
species number averaged 1.1 species per decade (Fig. 3), whereas 
during the past decade (2007 to 2016) the summits gained 5.4 addi- 
tional species on average (Fig. 3). There is a positive relationship 
between the magnitude of increase in plant species richness and 
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Fig. 1 | Geographical and temporal distribution 
of studied summits and surveys. The study is 
based on 698 surveys dating back to 1871 from 
302 summits in nine mountain regions across 
Europe. Each sampled summit is indicated 

by one line (bottom right), with black crosses 
indicating survey dates. Many of the historical 
surveys were conducted by leading pioneers 

in vegetation ecology in Europe (for example, 

J. Braun-Blanquet, E. Du Rietz, E. Riibel and 

B. Pawlowski). Numbers in brackets beside 

the region names indicate the number of 
summits/surveys. Photographs reproduced 

with permission from ref. *! (left, second left 

and second right; Botanic Garden Museum, 
Jagiellonian University, Krakow) and ref. *” 
(middle photograph; Wiley). Right-hand figure 
reproduced from ref. *°. Geospatial data for the 
map in all figures are from the WorldClim project 
(http://www.worldclim.org/). 
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change and the species accumulation rates on mountain summits across 
Europe corroborates the hypothesis that warming is the primary driver 


Fig. 2 | Average species richness change on 
mountain summits over time compared to mean 
annual temperature over time. Upper parts of 
inset panels, mean annual temperature; lower part, 
change in species richness (in species numbers). 
Nops, number of summits/surveys within the 
mountain region providing data for the panel. 
Correlation between rate of change in species 
richness and rate of change in temperature (AT-o,) 
is positive for all mountain regions (Extended 
Data Fig. 2). Orange shading marks the 5th and 
95th percentiles of the resulting richness change 
values from a bootstrapping approach across all 
summits in one region; see Extended Data Fig. 1 
for methodological details. 
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Fig. 3 | Rate of species richness change over time. a, Number of slope 
parameters per year (N; comparisons of earlier survey and later sampled 
resurvey). b, Rate of change in species richness (mean, black line). 
Positive values indicate an increase in species richness on summits and 
negative values indicate a decrease. Rates (ASR per year = (SR. — SRn)/ 
(t2 — t;) where SR is species richness and f is time) were averaged across 
all summits and inversely weighted by the number of years between 
observations (tf; — t;) to account for temporal resolution, as a longer 
period between surveys might mask short-term fluctuations. The black 
line interpolates across all summits with a generalized additive (spline) 
smooth model (R package mgcv version 1.8-17; the smooth term (k= 50) 
was chosen to allow enough degrees of freedom to closely represent the 
underlying pattern). The shaded grey area represents + s.e.m. 


of locally observed upward shifts of species ranges in mountains!!>”° 


(Fig. 2) and their recent acceleration’®*". Our findings thus align with 
those of shorter-term studies demonstrating plant community thermo- 
philization!>"” and range shifts driven by warming’. 

The observed relationship between temperature change and species 
richness change over the past 145 years is consistent across all nine 
regions. Changes in precipitation and nitrogen deposition also correlate 
regionally with changes in species richness, but the direction and mag- 
nitude of these effects differ strongly among regions (Extended Data 
Fig. 2). Although precipitation change (AP per year) has a moderate 
(positive) effect on species richness trends across Europe (Extended 
Data Table 3, Fig. 4b), its effect is not consistent and significant across 
all analysed regions (Extended Data Table 4, Extended Data Fig. 2) and 
is minor compared to the effect of temperature change (AT per year; 
Extended Data Tables 4, 5). Changes in grazing and tourism could also 
affect changes in plant species richness on summits”!. Local studies 
have suggested that grazing”” and frequent disturbance by tourists!® 
may suppress the elevational advance of alpine plants in response to 
warming in mountains. Although quantification of these relationships 
is challenging, locally declining levels of domestic livestock have often 
coincided with recovery of wild ungulate populations. Hiking tour- 
ism has increased on some summits, but intensities of human impact 
vary strongly. Land-use changes may thus explain parts of the local 
variation in species richness trends, but they vary greatly within and 
among regions. Without a consistent impact on species re-distribution, 
it is unlikely that changes in grazing and tourism can account for the 
consistent, continent-wide increase in plant species richness evident 
in our data. 

Some previous observations have suggested that upslope species 
migration in mountains occurs almost in synchrony with climate 
warming’’, whereas findings from other studies indicate that long lags 
in dispersal, establishment and extinction can be expected for many 
alpine plant species”*4, We systematically tested for time-lags (up to 10 
years) in increases in species richness following changes in climate, but 
found that the inclusion of time-lags did not significantly improve the 
explanatory power of our models (Extended Data Table 6). This finding 
suggests that increases in species richness on European summits are a 
direct and immediate response to climate warming (Fig. 2) and, thus, 
can be expected to accelerate further as climate warming continues 
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Fig. 4 | Rate of species richness change related to the rate of temperature 

change and precipitation change across all sampled mountains in 

Europe. a, Rate of species richness change (ASR per year = (SR — SRn)/ 

(tz — t1)) related to the rate of temperature change. b, Rate of species 

richness change related to the rate of precipitation change. Note that this 

pattern differs considerably among regions (see Extended Data Fig. 2 

for more details at the regional level). Dots are semi-transparent, with 

darker symbols indicating overlapping points. Trend lines and R? values 

are based on univariate linear regressions and significance, indicated by 

stars, is based on F statistics (see Methods and Extended Data Table 3). 

The relationship between change in species richness and accumulated 

nitrogen (not shown) is not significant because nitrogen deposition varies 

strongly across Europe whereas the change in species richness shows 

the same trend across the continent. Figures and models are based on 

396 observations (comparison of all 698 survey and resurveys for the 

302 summits). See text and Methods for more detailed analyses with 

generalized mixed effects offset models, including regional differences. 


to accelerate!. However, because we focus on the average trend and 
do not account for non-colonizing lower-altitude species, we cannot 
exclude the possibility that only a fraction of species responded quickly 
to climate change, thus creating the observed relationship, while an 
unknown number of species lags behind the change in climate. Our 
observations may, therefore, underestimate the expected long-term 
species turnover on summits. 

The accelerated increase in species richness on mountain summits 
is likely to result from an upward shift in the upper range limits of an 
increasing number of species. Trait analyses show that new colonizers 
exhibit growth strategies characteristic of species from lower elevations, 
such as larger size (P < 0.001), higher specific leaf area (P <0.001) and 
a general association with warmer temperatures (P < 0.001; Extended 
Data Table 7) compared to established species. Ultimately, the lower 
range limits of species will also shift upwards, but these limits are often 
determined and changed by biotic interactions and are, therefore, only 
indirectly related to temperature’. As more species become established 
at high-elevation sites, local extinctions will be likely to result from 
competitive replacement of slow-growing, stress-tolerant alpine spe- 
cies by more vigorous generalists that benefit from warming, rather 
than by direct adverse effects of warming on the summit species”®. 
However, competitive replacement of resident species requires that 
colonizers build up sufficiently large populations. Local extinctions 
should hence follow colonization with a time-lag. Consequently, accel- 
erating plant species richness increases are expected to be a transient 
phenomenon that hides the accumulation of a so-called extinction 
debt”*?’. The relaxation time until this debt is paid off is likely to be 
characterized by continuous shifts in abundance ratios, which may 
serve as sensitive early warning signals of upcoming extinctions’*. The 
length of this relaxation time will probably depend on factors such as 
the longevity of high-elevation species, plant clonal abilities and the 
local microhabitat diversity, supporting the persistence of cold-climate 
microrefugia for high-alpine species”*”?. Although these processes, 
along with species’ intrinsic ability to tolerate changing climates, may 
buffer local extinctions, a rapid loss of alpine-nival species may occur 
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under accelerated climate warming. Additionally, if major changes and 
extinctions in alpine systems are not gradual, but are instead initiated by 
threshold-like dynamics (for example, shrub and tree encroachment), 
critical tipping points may be approached with increasing speed under 
accelerated climate warming. 

Our results underline the link between accelerating climate warm- 
ing and species richness change in mountains. We thus provide a 
particularly compelling example of the human-driven impact on ter- 
restrial biota that is highly consistent with the recently reported Great 
Acceleration in Earth system trends in the Anthropocene and strik- 
ingly synchronous with the recent accelerating trends observed in many 
socio-economic indicators®. The observed acceleration of biodiversity 
change in mountain ecosystems highlights the rapid and widespread 
consequences of human activities on the biosphere, with important 
consequences for ecosystem functioning, human wellbeing, and the 
dynamics of climate change*®. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0005-6. 
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METHODS 


Vegetation resurveys on European mountain summits. Precise relocation of veg- 
etation records is possible on mountain summits. European botanists, fascinated by 
the limits of plant life, noted this potential more than a century ago (Fig. 1)'*: “On 
the basis of a comprehensive description of locations, it will not be difficult to verify 
my species lists, and an increase or decrease of species richness in the future will be 
possible to detect with high certainty,’ (Josias Braun-Blanquet in 1913, translated 
from™, p. 329). This foresight and the data these botanists gathered on mountain 
summits give us the opportunity to study the effect of accelerated warming on plant 
species richness. Thus, summits are optimal for resurveys of species occurrences 
and for detecting changes in plant species richness over time, even when the first 
surveys were carried out before the GPS era. In this study, 302 summits with his- 
torical vegetation records were resurveyed between one and six times, resulting 
in a total of n= 698 surveys. All vegetation surveys were conducted in summer. 
For each survey, all plant species occurring on the summit (generally delineated 
by the uppermost 10m of elevation)*° were noted. Vegetation surveys were com- 
pared for each summit. Species names were standardized to the nomenclature of 
Flora Europaea (or local flora for species absent in the Flora; see Supplementary 
Information). 

Environmental data. For each summit, mean monthly temperature and precipita- 
tion were calculated following the established change factor methodology*®, which 
combines statistical downscaling with temporal trend analyses. First, temporal 
data available from CRU TS 3.23 (0.5° resolution, 1901-2015)*” and the European 
Gridded Monthly Temperature (0.5° resolution, 1765-2000)°* were statistically 
related to the higher spatial resolution of WorldClim monthly mean climatic 
grids (30 arcsec resolution) for the overlapping period of 1950 to 2000 using the 
change factor method**. We assumed that anomalies (compared with mean value 
over the period 1950-2000 of the coarse-grained climatic conditions minus the 
climatic conditions within each smaller pixel of WorldClim) computed for the 
overlapping period (1950-2000) remained the same before 1950 and after 2000. 
Second, elevational differences between summits and the mean elevation of the 
corresponding WorldClim digital elevation model were included as an additional 
correction term (—0.006 °C x Aelevation (m)) for mean temperature data. By 
combining the two corrections, temporal trends available from the 0.5° degree 
resolution temporal data were corrected for differences originating from scale and 
climate model, and the precise elevation of the summit (temperature only). While 
we consider the resulting temporal trends for the temperature data to be reliable 
owing to the generally higher spatial and temporal autocorrelation and a higher 
correlation with elevation, the precipitation data do not show a systematic change 
with elevation and are less predictable over small spatial distances*? and, therefore, 
need to be interpreted more cautiously. Environmental variables were included in 
the models after calculating temporal changes (see Importance of environmental 
drivers’). Consequently, environmental variables are unbiased by weaknesses in the 
spatial interpolations. For temperature and precipitation, time series from CRU TS 
3.23 (1901-2015) and the European Gridded Monthly Temperature (1765-2000) 
were combined to match the study period (1880-2016) by taking the mean per 
grid cell for the overlapping years (Spearman r=0.97 for the overlapping period 
1901-2000). As neither of the two data sources extends to 2016, climate values for 
2015 were taken again for 2016 for the 19 affected summits. Furthermore, histor- 
ical nitrogen deposition data (NH, and NO, modelled from 1850 to 2010) were 
extracted from the European Fluxes Database (http://www.europe-fluxdata.eu/) 
and extrapolated for the missing five years (2011-2016). The data originate from 
the global chemistry transport model version 5 (TM5, annual data with a 0.25° 
latitude/longitude resolution)*°. Data handling and all subsequent analyses were 
conducted in R version 3.3.14". 

The velocity of species richness changes. Species richness (SR) on mountain summits 
was analysed for its change with time (t, year of record) across all summits by 
implementing a generalized linear mixed effects model (GLMM) with a Poisson 
family error distribution (SR ~ t) and a random effect (intercept) of mountain 
to account for repeated samples (GLMM 1 in Extended Data Table 1; all mixed 
effects models were built with R package Ime4 version 1.1- 12)”. Further, we ran the 
models including random effects (intercept) of region (mountain nested in region; 
GLMM 2 in Extended Data Table 1) and observation ID (to account for over- 
dispersion‘?; GLMM 3 in Extended Data Table 1). All models provided qualitatively 
equivalent results (Extended Data Table 1). We repeated all GLMMs allowing a 
breakpoint (bp) in the relationship between species richness and time by fitting 
independent slope coefficients for the time period before and after the breakpoint 
(SR ~ ifelse(t < bp, bp — t, 0) + ifelse(t< bp, 0, t — bp) + random structure). The 
breakpoint was fitted independently by minimizing the model deviance (Extended 
Data Table 1). 

Acceleration of species richness changes. The potential acceleration in the average 
velocity of species richness changes on mountain summits between 1871 and 2016 
was tested by means of a linear mixed effects model (LMM) with a Gaussian family 
error distribution (ASR/At ~ typ). With the model, we analysed the rate of change 
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in species richness over time (midpoint year between two surveys typ = (ti + f2)/2). 
The dependent variable ASR/At was calculated from the difference in species rich- 
ness and the difference between years of observation of two consecutive surveys on 
the same summit ((SRy2 — SRy)/(t — f))). A random effect (intercept) of mountain 
was included to account for repeated samples. We also ran the model including a 
random effect (intercept) of mountain nested within region but found qualitatively 
similar results (Extended Data Table 2). Mathematically, ASR/At is independent of 
richness on the summits as well as of time elapsed between sequential visits on the 
summit. However, more species-rich summits seemed to be associated with higher 
rates of change, as indicated by a significant positive effect if the species richness 
of the first survey was included as an explanatory variable in the fixed component 
of the LMM (Extended Data Table 2). We also tested whether there was an effect 
of the number of years between two consecutive surveys on ASR/At, as a longer 
period between surveys might mask short-term fluctuations, but this effect was 
not significant (Extended Data Table 2). 

A linear increase in the rate of change with time (ASR/At ~ typ) corresponds 
to an accelerated richness increase. As Figs. 2 and 3 indicate a nonlinearity in 
the relationship, we also ran all models allowing a breakpoint in the relationship 
between the rate of change and the time between surveys (Extended Data Table 2). 
It is likely that the real breakpoint (compared with the onset) of the acceleration 
trend in the increase in plant species richness happened slightly later than the 
breakpoint suggested by this particular analysis. Indeed, the estimated breakpoint 
approximates the timing of change as the year between two sequential surveys 
and thus mechanistically moves every change temporally towards the median of 
the time series. 

In the raw data, the average rate of species richness increase per summit was 
found to be much higher in the past decade (2007-2016; + 2.9 species) compared to 
fifty years earlier (1957-1966, + 1.1 species). When the slopes are averaged across 
all summits with an observation before and after a given year, inversely weighted 
by the number of years between observations (to account for temporal resolution, 
as a longer period between surveys might mask short-term fluctuations), the dif- 
ferences become even more apparent (+ 5.4 species in the past decade as opposed 
to+ 1.1 species per decade fifty years earlier). 

We analysed changes in absolute species numbers, as relative changes are sensi- 
tive to the richness values to which they are normalized. Still, repeating the linear 
mixed effects model with the changes in relative species richness (calculated by 
taking the difference between survey and resurvey normalized by resurvey richness 
and years between observations) revealed equivalent results and the same conclu- 
sions as using changes in absolute species numbers over time. 

Visualization of temporal changes in richness. The average richness change per year 
(ASR/At= (SRy — SRi)/(t2 — t)) across all summits was calculated (Extended 
Data Fig. 1a). Figure 3 displays how the average in ASR/At across all summits 
changed over time. As values for ASR/At originating from summits with a higher 
temporal sampling density better represent the instant rate of change for that spe- 
cific year (t), we inversely weighted the calculated values for ASR/At by the dif- 
ference in years between observations (t) — t;) to account for temporal resolution. 

The changes in species richness per year (ASR/A?) accumulate over time and 

result in an absolute change in species richness (Extended Data Fig. 1b). These 
absolute changes in species richness are visualized for each region in Fig. 2 (black 
line). In order also to visualize variability within regions, confidence intervals were 
calculated on the basis of the standard deviation of richness change among sum- 
mits in a region (Extended Data Fig. 1c, d). 
Importance of environmental drivers. The average velocity of species richness 
changes (ASR/At?) was related to the change in mean annual temperature (AT/At; 
T is temperature) and precipitation (AP/At; P is precipitation) for the same period 
(see below for further details), as well as to the accumulated nitrogen deposition 
(Naccums details explained below) across all summits, by implementing LMMs 
with a Gaussian family error distribution that included each of the three potential 
explanatory variables (different rows in Extended Data Table 3, model formula can 
be seen in table caption). Variable performance was compared using the corrected 
version (for small sample size) of the Akaike Information Criteria (AICc™). All 
LMMs consistently detected a clear positive relationship between species richness 
changes and temperature changes, while a slightly weaker positive relationship with 
precipitation changes was detected. In particular, the relationship with temperature 
change is surprisingly strong considering that climate models are built on long- 
term air temperature measurements at two metres above ground in climate stations 
that are mainly located in valleys and can only approximate changes in growth 
conditions for summits species. No relationship with the accumulated nitrogen 
deposition was detected across Europe (Extended Data Table 3). 

The explanatory variables AT/At and AP/At were calculated as the mean 
change per year (for example, AT/At= (Tj — T)/(t2 — ty)). Climate variables 
such as temperature and precipitation are usually integrated over longer time 
periods to level out short-term fluctuations. As we were interested in the effect 
of such shorter-term fluctuations, we systematically tested which periods would 
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provide the best fit within our LMM framework (1-30 years). Besides mean annual 
temperature and precipitation, we further tested alternative measurements of the 
climate variables. If species’ ranges were limited primarily by growing season tem- 
peratures, we would expect spring and summer warming to best explain temporal 
changes in species richness. Alternatively, if many alpine species were limited not 
by growing season temperature, but rather by climatic extremes, winter temper- 
atures or precipitation might be more important in determining which species 
can survive in a given location. We therefore systematically pre-analysed temper- 
ature and precipitation variables by testing for the effect of winter precipitation 
(December-February) and of snow accumulation (precipitation in months with a 
mean temperature below freezing). 

Further, nitrogen from deposition may accumulate in the soil, particularly in 
high elevation systems with limited resource cycling***. In our data, nitrogen 
deposition has declined sharply in recent decades”, although its accumulated effect 
may still influence community dynamics”’”. We thus calculated accumulated dep- 
osition of both NH, and NO; since 1850 for each vegetation survey. 

The systematic test of different variables and time periods (Extended Data 

Table 5) identified annual summer temperature (15-year mean), annual precipi- 
tation (1-year mean) and NO; (referred to as Naccum) as the most suitable predic- 
tors, and these variables were then used in all subsequent analyses. As this type 
of variable selection biases analyses towards significant relationships, all analyses 
were repeated with mean annual values (10-year mean), resulting in qualitatively 
similar results. Model residuals were visually checked for temporal autocorrelation, 
and there was no sign of a temporal trend in the residuals. 
Time-lags in richness change. Biotic responses may show a delayed response to 
climate change!”4, as species may need considerable time to spread and estab- 
lish (compare migration and establishment lags). Therefore, observed species 
richness on a mountain summit at a given point in time could reflect climatic 
conditions from several years earlier. A systematic time-lag was therefore imple- 
mented between our species observations and the climate period used to relate the 
average velocity of species richness changes to changes in climatic conditions, and 
an increase in explanatory power by including a time-lag (5 or 10 years) was tested 
(Extended Data Table 6). Final results are presented without time-lags because 
including them did not increase the power in our analyses to explain the average 
velocity of species richness changes. 

An alternative approach to analysing the average velocity of species richness 
changes (ASR/At) with rates of change in environmental predictors (AT/At, 
AP/At; see Extended Data Table 3) is to directly relate species richness changes 
(ASR) to changes in environmental variables over the same period (AT, AP). 
This approach is more intuitive (and closer to the data) but ignores differences in 
time between sampling events. Analyses using this approach yielded results qual- 
itatively similar to the results of the main analysis (Extended Data Table 3), with 
the exception that the effect of precipitation changes was not significant (Extended 
Data Table 4). 

Trait-based analyses. Differing trait signal in colonizing species. Changes in plant 
life strategies and dispersal constraints would be represented by a systematic dif- 
ference in indicative traits. We thus compared specific leaf area (SLA)**, plant 
height** and seed mass** among colonizing species and species in the resident 
community, using a LMM framework with ‘resurvey’ as a random effect. To test for 
the colonization and establishment, within the recipient community, of warmth- 
tolerating species from lower elevations, we used Landolt species indicator values 
for temperature”. For 364 resurveys (12,738 observations for 873 species), direct 
comparisons of plant trait values of newly established colonizers (that is, additional 
species recorded in a resurvey) with those of species that had been present in 
the previous survey (recipient community) indicate significantly increased SLA 
(P< 0.001) and plant height (P < 0.001) of successful colonizers, but no significant 
difference in seed mass (P=0.85). Colonizers were also more adapted to warmer 
climates (showing higher Landolt temperature values) than species of the resident 
community (P< 0.001; Extended Data Table 7). 

Data reliability. Sampling intensity. Our analysis of the rate of change is relatively 
robust with respect to different sampling periods. The increasing sampling fre- 
quency over time (Fig. 1) helped to reliably quantify the rates of change in later time 
periods and thus to support our conclusion of an acceleration in richness change. 
Consistent continent-wide and short-term fluctuations in species richness that 
might have occurred in the early 20th century would be likely to go undetected 
owing to the low data availability in the early 20th century of our time series data, 
but long-term trends would be clearly visible. There is, however, no evidence that 
the unbalanced sampling effort over time and different sampling intervals hide 
unobserved fluctuations in early periods. In line with this, the summits for which 
we have a large number of repeated surveys show small short-term fluctuations 
but confirm the detected steady increase of richness over time and an acceleration 
in recent years!°, 


Observer errors. Previous studies explicitly addressing observer errors in summit 
resurveys have demonstrated reliable quantification of vegetation change over long 
time periods’. Many of the early records were collected by expert botanists with a 
scientific interest in long-term changes and the explicit aim of enabling accurate later 
resurveys. To further reduce potential sampling and observer errors, recent resur- 
veys were conducted without knowledge of the past species lists because surveyors 
who know the historical species composition have a higher chance of finding certain 
species again. Approximately 15% of all summits of this dataset have species records 
collected in the 1980s and 1990s (they were even carried out by the same people in 
some cases). Even if these early re-surveyors also considered the above methodolog- 
ical issues, we cannot rule out that the observer effort of the early re-surveyors was 
greater than that of the historical surveyors. However, our carefully implemented 
re-survey methodology made sure that our recent observer effort did not exceed 
that of the early re-surveyors during the 1980s and 1990s. Given this, the clear signal 
that most of the increase in species richness occurred after the 1980s and 1990s is a 
strong indication that a possible increase in observer effort, if present, is responsible 
for only a limited amount of the increase in species richness. We are, thus, confident 
that observer errors did not systematically influence our analyses. 

Summit area. Summit area may affect the observed changes in species richness, 
probably through its effect on species richness (compare with species—area rela- 
tionship). We cover this potential effect of area on the change in species richness 
by including absolute species richness as a co-variate in our analyses. A potential 
direct effect of area could be tested only for the summits within Switzerland, as data 
of sufficient spatial resolution to calculate the surface of the uppermost 10m ofa 
summit was available to us only from this country (swissALTI3D model, a digital 
elevation model with 2-m resolution). The summit area for Swiss summits varied 
by as much as 40 times (392-16,720 m’). Surprisingly, regression analyses indi- 
cated that there was no significant effect of summit area on the historical or recent 
species number, or on the change in species number (area was log-transformed 
to reach normal distribution). Further evidence of a limited effect of summit area 
is indicated by the fact that in recent resurveys the species numbers of historic 
surveys were reached within the uppermost 4—5 m of each summit, which on a 
summit with conical shape corresponds to a much smaller area than the originally 
sampled uppermost 10 m (Extended Data Fig. 3). We conclude that, on mountain 
summits, factors independent of area, for example, environmental conditions and 
micro-topographic variability*®, seem much more important for species richness 
or changes thereof than area per se. 

Data and code availability. Data and R code are available from the corresponding 
authors. 
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Extended Data Fig. 1 | Visualizing richness change. This conceptual 
figure shows the approach implemented in the main text to visualize 
richness change over time based on the raw data (Figs. 2, 3). a, The mean 
richness change per year (ASR/At=(SRy — SRi)/(t2 — t)) across all 
summits was calculated (Fig. 3). b, The mean richness change per year 
accumulates with time to yield absolute changes in species richness (black 
line in Fig. 2). c, d, Variability in the absolute change in species richness 


was visualized by randomly sampling ASR from all mountains available 
each year, but adding the s.d. within a region and year. The displayed range 
in Fig. 2 illustrates the 5th and 95th percentiles of the resulting richness 
change values from 1,000 runs (orange shading in Fig. 2). This approach 
reveals changes in variability among mountains over time while also 
showing overall variability for time steps where only a few summits were 
sampled (particularly in early time periods). 
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vegetation surveys from the same summit at different times (Extended 
Data Fig. 1). No nitrogen data were available for Svalbard. The number 
of observations (comparison of survey and resurveys) are: Svalbard, 

7; Northern Scandes, 54; Southern Scandes, 27; Scotland, 7; NW 
Carpathians, 16; Eastern Alps, 122; Western Alps, 48; SE Carpathians, 9; 
Pyrenees, 12 (see Fig. 1 for more details). 


Extended Data Fig. 2 | Relationship between rates of changes in 
species richness across Europe and rates of increase in temperature 
(left column), rates of change in precipitation (middle column) 
and accumulated nitrogen deposition (right column). Trend lines 
are interpolated from a simple linear model and are in many cases not 
significant. Species richness was quantified as the difference between 
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Extended Data Fig. 3 | Historical and recent species richness versus 
sampling area. Historical species richness was exceeded within a small 
sampling area during recent resurveys. Species richness of the historical 
survey (yellow) contrasted with a species richness accumulation curve of 
the recent surveys on summits where the highest occurrence of each recent 
species was estimated to the nearest 1-m elevation. The number of species 
found historically within the uppermost 10 m of a summit was exceeded 
within the uppermost 5 m in the most recent resurveys. This analysis 
includes all 157 European summits for which such data are available, 
regardless of whether the historical species number was reached in recent 
times. The blue circle visualizes average species richness of the recent 
surveys within the uppermost 10m. 
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Extended Data Table 1 | Increase in species richness with time 


Fixed effect (coefficients +std. error) Random effects (std. deviations) 
Model Intercept Year of record Mountain __ Region: Mount. ID AICc 
GLMM 1 ~ -5.84+0.35*** — 0.004 +0.0002*** 0.97 - - 5758 
GLMM 2 -5.84+0.35*** — 0.004 +0.0002*** 0.88 0.41 - 5760 
GLMM 3 ~ -7.31 +0.57*** — 0.005 +0.0003*** 0.75 0.60 0.22 5585 
GLM -7.60 + 0.33*** 0.006 +0.0002*** - - - 18256 
Model Intercept Time < BP Time > BP Mountain Region: Mount. ID AICc 
GLMBM _ 2.73 +0.07*** 0.001 +0.001 0.013 +0.001*** 0.96 - - 5684 
GLMBM _— 2.73 +0.07*** 0.001 +0.001 0.013 +0.001*** 0.87 0.41 - 5686 
GLMBM _ 2.64 +0.07*** 0.001 +0.003 0.006 +0.0004*** 0.83 0.49 0.22 5583 
Generalized linear mixed effects models (Poisson family error distribution) show an increase in species richness with time (richness ~ year of record). Different random effect structures were applied. 


The lower panel includes a breakpoint in the relationship between rate of richness change and time. The breakpoint was fitted independently by minimizing model deviance and was estimated around 
the year 1970. All models are based on 698 observations. Significant effects are indicated by asterisks (***P <0.001). GLMM, generalized linear mixed effects model; GLM, generalized linear model; 
GLMBM, generalized linear mixed effects breakpoint model; ID, observation ID. 
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Extended Data Table 2 | Acceleration of the increase in species richness over time 


Fixed effect (coefficients +std. error) Random effect (std. deviations) 
Intercept Time Richness Period Mountain Region: AICc 
Mount. 
-15.5+2.06*** 0.008+40.001*** - - 5.8x10° - 570.1 
-15.5+2.06*** 0.008+40.001*** - - 0.0 0.0 572.1 
-13.4+2.05*** —0.007+40.001*** 0.004+0.001*** — - 0.0 - 561.7 
-11.7+4.76* 0.006£0.002* (p=0.012) 0.004+0.001*** nis. 0.0 - 575.1 
(p=0.014) 
-13.4+2.05*** 0.007+0.001*** 0.004+0.001*** — - - - 529.9 
Intercept Time < BP Time > BP Richness Period Mountain Region: AICc 
Mount. 
0.07+0.05 0.002+0.003 0.01340.002*** — - - 0.0 - 571.0 
0.07+0.05 0.002+0.003 0.013+40.002*** 0.0 0.0 573.1 
0.02+0.05 0.0001+0.003 0.011+0.002***  0.004+0.001*** = - 0.0 - 567.8 
-0.09+0.14 0.0004+0.004  0.012+0.004*** 0.004+0.001*** nis. 0.0 - 580.7 
0.02+0.05 0.0001+0.003 _0.011+0.002*** —0.004+0.001*** —_ - - - 527.0 
Linear mixed effects models (Gaussian family error distribution) showed an acceleration of the increase in species richness over time (ASR/At ~ t). Different random effect structures were 
implemented. The species richness from the summit’s first survey and the number of years between two consecutive observations (period) were included as additional explanatory variables. The lower 


panel further includes a breakpoint in the relationship between rate of richness change and time. The breakpoint was fitted independently by minimizing model deviance and was estimated for the 
year 1971. All models were based on 396 observations (comparison of survey and resurveys). Significant effects are indicated by asterisks (*P< 0.05, **P<0.01, ***P<0.001; P values > 0.001 are 
additionally reported in brackets). Note that models without random structure performed best. 
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Extended Data Table 3 | Explanatory variables for velocity in species richness changes 


Intercept AT/At AP/At IN cae Richness AlCc AICWt 
0.01 +0.06 9.8+1.1*** 0.005 +0.001***  -0.16+0.09 0.004+0.001*** 488.1 0.64 
-0.06 40.04 9.541.1*** 0.005 +0.001*** = - 0.004 +0.001*** 489.3 0.36 
0.03 +0.06 9.1 41.1***  - -0.17+0.09 0.004 +0.001*** 509.2 0.00 
0.14 +0.06*  - 0.004 +0.001***  -0.07+0.10 0.006 +0.001*** 556.2 0.00 


Results of linear mixed effects models (Gaussian family error) showing the relationship of the average velocity in species richness changes with the change in potential explanatory variables 


(temperature, precipitation, nitrogen deposition). Initial speci 


es richness on the summits was added as a further independent variable and indicated that species-rich systems showed a larger net 


change. The implemented model formula was Imer(ASR/At ~ AT/At+ AP/At+ Naccum + richness + (1 | mountain)). Model performance was compared using AlCc, which also defines the order of 


models, with the best one on top. In addition, significant resu 


ts from tests using F statistics are indicated by asterisks (***P < 0.001). All values indicate model coefficients +s.e. Rerunning the analyses 


after centring (subtracting the means) and scaling (dividing by s.d.) the explanatory variables indicated a larger coefficient and thus stronger effect of temperature than that of precipitation 
(ASR/At=0.00 (40.04) + 0.39 (40.05) x AT/At*** + 0.22 (£0.04) x AP/At***+0.21 (£0.05) x richness***; asterisks indicate significant effects with ***P<0.001). As no nitrogen data were 
available for the seven summits on Svalbard, the analyses presented in the table were performed on a subset of 389 temporal comparisons (comparing surveys and resurveys resulting from 
684 observations). To account for spatial autocorrelation, we further repeated the full model averaging over all summits sampled over the same time period and falling in the same grid cell of the 
original climate data. The results of this model were qualitatively similar (ASR/At=—0,.004 (£0.05) +9.7 (41.1) x AT/At*** + 0.005 (£0.001) x AP/At*** — 0.14 (£0.09) x Naccum+ 0.005 


(40.001) x richness***), 
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Extended Data Table 4 | Explanatory variables for species richness changes 
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Intercept AT/At AP/At IN cain Richness AICc 


AICWt 
7.7 £1.6*** S.8+1.2*** — - -5.4 +2.3* (p=0.02) - 2950.2 0.56 
L321 18** 6.2 +1.3*** — 0.002 +0.002 -5.1 +2.4* (p= 0.03) - 2951.1 0.34 
4.7 +1.2*** 5.9+1.3*** — 0.003 40.002 - - 2953.6 0.10 
12.0+1.5*** — - -0.001 40.002 _-3.8 +2.4 - 2969.9 0.00 


Linear mixed effects models (Gaussian family error distribution) s 


owing the direct relationship between species richness changes and changes in potential explanatory variables (temperature, 


precipitation, nitrogen deposition). Initial species richness on the summit was not added as a further independent variable as it did not show significant effects in any of the models. The implemented 
model formula was Imer(ASR ~ AT + AP + Naccum + richness + (1 | mountain)). Variable performance was compared using AlCc, which also sets the order of models, with the best one on top. Additional 
significance tests using F statistics are indicated by asterisks (*P< 0.05, **P<0.01, ***P<0.001; P values > 0.001 are additionally reported in brackets). All values indicate model coefficients +s.e. 
Rerunning the analyses after centring (subtracting the means) and scaling (dividing by standard deviations) indicated a larger coefficient and thus stronger effect of temperature compared to that of 
precipitation (ASR ~ 0.05 (+ 0.06) + 0.25 (£0.05) x AT***+0.05 (£0.05) x AP — 0.11 (40.05) x Naccum*). The analyses were performed with the same data as specified in Extended Data Table 3. 
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Extended Data Table 5 | Model evaluation for different explanatory variables and time periods 


Temperature 

Explanatory variable Period AICc AAICe AICc weights 
Summer temperature 15 488.1 0.0 1.0 
Annual temperature 15 496.5 8.4 0.0 
Spring temperature 10 507.6 19.5 0.0 
Annual temperature 7 509.1 20.9 0.0 
Spring temperature 7 513.2 25.1 0.0 
Summer temperature 7 514.7 26.6 0.0 
Annual temperature 10 516.0 27.9 0.0 
Annual temperature 30 517.4 29.3 0.0 
Spring temperature 15 517.6 29.4 0.0 
Summer temperature 5 526.7 38.5 0.0 
Annual temperature 3 526.9 38.7 0.0 
Spring temperature 30 528.3 40.2 0.0 
Summer temperature 1 530.6 42.5 0.0 
Summer temperature 30 532.5 44.4 0.0 
Annual temperature 1 534.9 46.8 0.0 
Annual temperature a} 535.5 47.3 0.0 
Summer temperature 10 545.6 37.5: 0.0 
Spring temperature 5 546.2 58.1 0.0 
Summer temperature 3 547.1 58.9 0.0 
Spring temperature 1 548.2 60.1 0.0 
Spring temperature 3 551.4 63.3 0.0 
Precipitation 

Explanatory variable _ Period AICc AAICe AICc weights 
Annual precipitation 1 488.1 0.0 1.0 
Snow precipitation 1 495.2 7.1 0.0 
Winter precipitation 15 501.3 13,2 0.0 
Annual precipitation 30 502.7 14.5 0.0 
Snow precipitation 3 502.9 14.8 0.0 
Winter precipitation 1 504.4 16.2 0.0 
Snow precipitation 30 504.7 16.5 0.0 
Winter precipitation a] 505.6 id 0.0 
Summer precipitation 30 506.0 17.9 0.0 
Winter precipitation 30 507.7 19.6 0.0 
Summer precipitation a} 507.7 19.6 0.0 
Snow precipitation 10 508.2 20.1 0.0 
Snow precipitation 15 509.2 21.1 0.0 
Snow precipitation o| 509.5 21.3 0.0 
Annual precipitation 5 509.7 21.6 0.0 
Annual precipitation 15 509.7 21.6 0.0 
Winter precipitation 3 509.8 21.6 0.0 
Annual precipitation 10 510.3 22.1 0.0 
Summer precipitation 15 510.4 22.2 0.0 
Summer precipitation 10 510.4 22.3 0.0 
Summer precipitation 3 510.6 22:5 0.0 
Summer precipitation 7 510.9 22.8 0.0 
Winter precipitation 10 511.0 22.9 0.0 
Annual precipitation 3 511.2 23.1 0.0 
Annual precipitation 7 511.2 23.1 0.0 
Snow precipitation 7 511.2 23.1 0.0 
Summer precipitation 1 511.3 23.1 0.0 
Winter precipitation 7 511.3 23.1 0.0 
Nitrogen 

Explanatory variable Period AICc AAICe AICc weights 
NO accumulation - 488.1 0.0 0.6 
NH accumulation - 489.0 0.9 0.4 


Linear mixed effects models (Gaussian family error distribution) analysing the relationship between average velocity of species richness changes and the change in potential explanatory variables 
(temperature, precipitation and nitrogen deposition). The implemented model formula was Imer(ASR/At ~ AT/At+ AP/At+ Naccum + richness + (1 | mountain)). Within each new model, the focal 
variable (left column) was exchanged, while the remaining variables were held constant. Variables were calculated as the mean value across a period before the survey (Period). The analyses were 
performed with the same data as in Extended Data Table 3. 
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Extended Data Table 6 | Model evaluation for different time lags 


Summer Temperature (15-year mean) 
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Time lag AICe AAICc AICe weights 
0 496.5 0.0 1.0 

5 531.3 34.8 0.0 

10 546.5 50.0 0.0 

Annual precipitation (1-year mean) 

Time lag AICe AAICc AICe weights 
0 507.7 0.0 0.72 

5 510.7 3.3 0.16 

10 511.3 3.6 0.12 
Nitrogen accumulation 

Time lag AICe AAICc AICe weights 
0 488.1 0.0 0.34 

5 488.2 0.1 0.33 

10 488.2 0.1 0.33 


Linear mixed effects models (Gaussian family error distribution) analysing the relationship between average velocity of species richness changes and the change in potential explanatory variables 
(temperature, precipitation and nitrogen deposition). The implemented model formula was Imer(ASR/At ~ AT/At+ AP/At+ Naccum+ richness +(1|mountain)). Explanatory variables were calculated 
as the mean value across a period before the survey. Within each new model, the focal explanatory variable implemented with a differing time lag (time between the period and survey; left column) was 
exchanged while the remaining variables were held constant. The analyses were performed with the same data as in Extended Data Table 3. 
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Extended Data Table 7 | Trait differences between colonizing and old-established species 


Trait Fixed effect (coefficients +std. error) Random effect 
(std. deviations) 
Intercept Difference of colonizer relative to Resurvey 
established species 

Plant height -0.234 +0.022*** = + 0.292 +0.022*** 0.31 

SLA -0.077 £0.017*** = +0.158 +0.024*** 0.13 

Seed mass -0.014 40.017 + 0.003 +0.025 0.09 

Temperature indicator -0.188 +0.023*** + 0.221 +0.020*** 0.35 
Linear mixed effects models (Gaussian family error distribution) revealed systematic trait differences between colonizing and old-established species. Analyses were implemented for 364 resurveys 
(12,738 observations with 815 species) with a rand 


om effect of resurvey. Temperature indicator values‘? were available for 90%, specific leaf area (SLA)“® for 61%, plant height*® for 76%, and seed 


mass*8 for 53% of the observations. Significant effects are indicated by asterisks (***P < 0.001). Trait raw data were first log-transformed, then centred to zero mean and scaled to s.d.= 1 before 


analysis. 
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A small peptide modulates stomatal control via 
abscisic acid in long-distance signalling 


Fuminori Takahashi!?*, Takehiro Suzuki’, Yuriko Osakabe!*, Shigeyuki Betsuyaku>®®, Yuki Kondo®, Naoshi Dohmae’, Hiroo 


Fukuda®, Kazuko Yamaguchi-Shinozaki’ & Kazuo Shinozaki>?* 


Mammalian peptide hormones propagate extracellular stimuli 
from sensing tissues to appropriate targets to achieve optimal 
growth maintenance’. In land plants, root-to-shoot signalling is 
important to prevent water loss by transpiration and to adapt to 
water-deficient conditions”*. The phytohormone abscisic acid has 
arole in the regulation of stomatal movement to prevent water loss*. 
However, no mobile signalling molecules have yet been identified 
that can trigger abscisic acid accumulation in leaves. Here we 
show that the CLAVATA3/EMBRYO-SURROUNDING REGION- 
RELATED 25 (CLE25) peptide transmits water-deficiency signals 
through vascular tissues in Arabidopsis, and affects abscisic acid 
biosynthesis and stomatal control of transpiration in association 
with BARELY ANY MERISTEM (BAM) receptors in leaves. The 
CLE25 gene is expressed in vascular tissues and enhanced in roots 
in response to dehydration stress. The root-derived CLE25 peptide 
moves from the roots to the leaves, where it induces stomatal closure 
by modulating abscisic acid accumulation and thereby enhances 
resistance to dehydration stress. BAM receptors are required for the 
CLE25 peptide-induced dehydration stress response in leaves, and 
the CLE25-BAM module therefore probably functions as one of the 
signalling molecules for long-distance signalling in the dehydration 
response. 

The Arabidopsis genome contains more than 7,000 small open 
reading frames with no known functional annotations*’. Several 
secreted peptides mediate cellular development in plants*!°. 
However, it is unclear whether peptide hormones mediate long- 
distance signalling in response to abiotic stress. CLAVATA3 (CLV3) 
is a well-characterized plant peptide involved in shoot apical meris- 
tem formation!’. The Arabidopsis genome contains 32 CLAVATA3/ 
EMBRYO-SURROUNDING REGION-related (CLE) family genes”. 
Tracheary element differentiation inhibitory factor (TDIF) is involved 
in the formation of vascular tissue that functions in water transport!>"4. 

To determine whether CLE peptides modulate long-distance 
signalling in dehydration stress response, the induction of NINE- 
CIS-EPOXYCAROTENOID DIOXYGENASE 3 (NCED3), a gene that 
encodes a key enzyme for abscisic acid (ABA) synthesis under con- 
ditions of dehydration stress in leaves!>!®, was analysed by treating 
roots with 27 synthetic CLE peptides. Of these peptides, the applica- 
tion of CLE25 to roots induced NCED3 expression and enhanced ABA 
accumulation in leaves (Extended Data Fig. la—c). Stomatal response 
to the application of CLE25, CLV3, CLE46 and TDIF peptides was 
analysed using CLE26 as a negative control, as because this peptide is 
the most homologous to CLE25 and mediates root development!”~”° 
(Extended Data Fig. 1d). CLE25 application induced a level of stoma- 
tal closure similar to that induced by ABA application (Fig. la and 
Extended Data Fig. le), wherease CLE26, CLV3, CLE46 and TDIF 
application did not induce stomatal closure (Extended Data Fig. 1f). 
Analysis of the dose-dependent effect of CLE25 on stomatal response 


showed that CLE25 functions in stomatal closure at nanomolar con- 
centrations (Fig. 1b), thereby indicating that CLE25 also functions as 
a hormone. Nano-liquid chromatography-tandem mass spectrome- 
try (nLC-MS/MS) was performed using leaves from plants with roots 
treated with a non-labelled or isotope-labelled synthetic CLE25 pep- 
tide. Accumulation of this peptide was detected in leaves (Extended 
Data Fig. 1g, h), indicating that synthetic CLE25 moved from roots to 
leaves and acted as a functional mobile signal. 

CLE25 expression was enhanced in roots in response to dehydration 
stress (Fig. 2a). Many genes involved in ABA biosynthesis and transport 
are expressed in vascular tissues!®!. Thus, vascular tissue is thought 
to be important for ABA biosynthesis under dehydration stress condi- 
tions. 6-Glucuronidase (GUS) reporter-aided histochemical analysis 
showed that CLE25 promoter activity occurred in lateral roots, the 
root tip, vascular veins of leaves and the procambium of primary roots 
(Fig. 2b, c). In situ hybridization analysis also detected the expression 
of CLE25 mRNA in the procambium of root vascular tissues (Fig. 2d). 

Using CRISPR-Cas9 genome editing, knockout mutants of the 
CLE25 gene (cle25) were generated. A CEL-I assay of the mutant 
genomes confirmed a CRISPR-Cas9-mediated mutation (Extended 
Data Fig. 2a). Isolated cle25 homozygous mutants contained a non- 
sense mutation at the guanosine at position 22 in the CLE25 cod- 
ing region (Extended Data Fig. 2b). Dehydration-induced NCED3 
expression was repressed in two cle25 mutants (Fig. 3a). The clv3-8, 
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Fig. 1 | The application of the CLE25 peptide affects stomatal closure 
in leaves. a, Roots of whole plants were incubated without (0h on x axis, 
n= 401) or with 0.01% acetonitrile (mock: n = 432, 1h; n=551, 2h; 
n=506, 3h), ABA (n=586, 1h; n=548, 2h; n=514, 3h) or CLE25 
peptide (n = 583, 1h; n=509, 2h; n =544, 3h) solution for the times 
indicated. Data are from three experiments. b, Detached rosette leaves 
were incubated without (0h on x axis, nm = 628) or with mock (n = 583) 
or with each concentration of CLE25 peptide (n = 647, 1nM; n= 644, 
10nM; n= 664, 100nM; n = 622, 1,000 nM) for 3h. Data are from three 
experiments. ***P < 0.001 as analysed by one-way ANOVA followed by a 
Tukey—Kramer post hoc test (a, b). 
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Fig. 2 | CLE25 is expressed in vascular tissues of roots and leaves. 

a, Tissue-specific expression of CLE25 in response to dehydration stress 
(n=6 biological replicates). *P < 0.05, **P < 0.01 as analysed by one- 
way ANOVA followed by a Tukey’s post hoc test. b, Tissue specificity 

of GUS staining in the roots and leaves of pCLE25::GUS plants (n = 8 
transgenic plants). Scale bar, 0.5mm. c, GUS staining of procambium in 


cle46-1 and tdr-1 (a mutant of TDIF RECEPTOR (TDIF, also known 
as PXY)) mutants did not repress NCED3 expression, as compared 
with that of wild-type plants in response to dehydration stress 
(Extended Data Fig. 2c—e). ABA-inducible genes—one from the LATE 
EMBRYOGENESIS ABUNDANT family (AT3G02480, here referred 
to as LEA) as well as RESPONSIVE TO DESICCATION 29B (RD29B; 
also known as LTI65)—were also suppressed in two cle25 mutants in 
response to dehydration stress (Fig. 3b, c), suggesting an important 
role for CLE25 in the regulation of NCED3 and ABA-induced gene 
expression. NCED3 expression was strongly correlated with ABA accu- 
mulation under dehydration stress conditions. After dehydration for 
three hours, ABA levels increased by sixfold in the leaves of wild-type 


the differentiation zone of root of a pCLE25::GUS plant (n = 4 transgenic 
plants). Scale bar, 50 pm. d, RNA in situ hybridization with the CLE25 
antisense or sense probe in cross sections of roots of wild-type plants 
(n= 12 biological replicates). Pc, procambium; Ph, phloem; Xy, xylem. 
Scale bars, 50 um. 


plants (Fig. 3d), whereas ABA accumulation in roots of wild-type plants 
was one-tenth of that in leaves under control conditions, and did not 
increase in response to dehydration stress. Accumulation of ABA in 
response to dehydration stress was abolished in leaves of cle25 mutants. 
Induction of stomatal closure by treatment with the CLE25 peptide 
did not occur in the ABA-deficient mutants nced3-2 and aba2-1, 
although ABA treatment enhanced stomatal closure, which suggests 
that CLE25 modulates stomatal closure in response to ABA accumula- 
tion (Extended Data Fig. 3). Stomatal conductance of cle25 mutants was 
similar to that of wild-type plants, under control conditions (Extended 
Data Fig. 4a). Water loss was greater in cle25 mutants than in wild- 
type plants within one hour, which indicates that CLE25 may modulate 
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Fig. 3 | CLE25 CRISPR-Cas9-derived knockout (cle25) mutants affect 
expression of dehydration-induced genes, ABA accumulation, water 
loss and dehydration stress sensitivity. a~c, Dehydration-induced 
NCED3 (a), LEA (b) and RD29B (c) expression in wild-type plants and 
cle25 mutants (n =6 biological replicate). d, ABA content of leaves or roots 
in wild-type plants and cle25 mutants in response to dehydration stress 
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(n=8 biological replicates). e, Water loss in wild-type plants and cle25 
mutants in response to dehydration stress (n = 6 biological replicates). 
*P < 0.05, **P<0.01 as analysed by one-way ANOVA followed by a 
Tukey’s post hoc test (a-e). f, cle25 mutants had a dehydration-sensitive 
phenotype (n= 90 plants per group; data from four experiments). WT, 
wild type. Scale bars, 2cm. 
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Fig. 4 | CLE25 peptide moves from roots to leaves and modulates 
NCED3 expression in leaves in association with the receptor-like 
kinases BAM1 and BAM3. a, NCED3 expression in the leaves of wild-type 
and cle25 mutants (n = 6 biological replicates) after CLE25 application to 
roots. b, NCED3 expression in grafted leaves of wild-type/wild-type, wild- 
type/nced3-2 and wild-type/aba2-1 plants (n = 6 biological replicates). 

c, Dehydration-induced NCED3 expression in grafted leaves in which 
shoots and roots were grafted between wild-type or cle25 #10 (n=6 
biological replicates). d, Dehydration-induced NCED3 expression in the 
bam mutants (n = 6 biological replicates). e, ABA content of leaves and 


stomatal responses in association not only with ABA accumulation 
but also with other rapid signals, including hydraulic water tension in 
response to dehydration stress (Fig. 3e). The cle25 mutants grew nor- 
mally in soil under well-watered conditions (Extended Data Fig. 4b-d), 
but exhibited a marked sensitivity to dehydration stress (Fig. 3f). CLE25 
RNA interference (CLE25 RNAi)-knockdown plants were also gener- 
ated, in which CLE25 expression was consistently repressed (Extended 
Data Fig. 5a). In CLE25 RNAi plants, expression levels of NCED3, LEA 
and RD29B were repressed when compared with those of wild-type 
plants (Extended Data Fig. 5b-d). CLE25 RNAi plants exhibited a 
dehydration-sensitive phenotype, which was similar to that of the nced3-2 
mutant (Extended Data Fig. 5e). These findings suggest the functional 
importance of CLE25 in dehydration stress responses and tolerance. 
Post-translational processing is critical for the biological functions of 
CLE peptides. Recent studies have demonstrated that mature CLE pep- 
tides undergo post-translational modifications, such as arabinosylation 
or the addition of hydroxy] residues in proline”. Extracellular secretion 
of the CLE25 peptide with two hydroxyproline residues was detected 
in the culture medium of Arabidopsis T87 cells (Extended Data Fig. 6). 
The application of CLE25 to roots increased NCED3 expression in 
leaves of both cle25 mutants and CLE25 RNAi plants, indicating that 
CLE25 application modulates NCED3 expression in leaves without 


roots in wild-type plants and bam1-5 bam3-3 mutants in response to 
dehydration stress (n = 8 biological replicates). f, bam1-5 bam3-3 mutants 
had a dehydration-sensitive phenotype (n= 110 plants per group; data 
from four experiments). Scale bars, 2cm. g, NCED3 expression in grafted 
leaves, in which the shoots and roots were grafted between wild-type or 
bam1-5 5 bam3-3 (n=6 biological replicates). h, NCED3 expression in the 
leaves of wild-type plants and bam2-5 bam3-3 mutants (n= 6 biological 
replicates). **P < 0.01 as analysed by one-way ANOVA followed by a 
Tukey’s post hoc test (a-e, g and h). 


endogenous CLE25 peptide expression (Fig. 4a and Extended Data 
Fig. 7a, b). Wild-type shoot scions grafted onto nced3-2 or aba2-1 root- 
stocks exhibited an enhanced level of NCED3 expression in leaves after 
CLE25 application to roots, which was similar to that of control graft 
plants (wild type/wild type; scion/rootstock) (Fig. 4b), suggesting that 
CLE25—but not root-derived ABA—modulates NCED3 expression 
in leaves. 

Grafted cle25 #10/cle25 #10 (numbers with # indicate the inde- 
pendent line number of cle25 mutants) plants exhibited a reduction in 
NCED3 expression in leaves in response to dehydration stress, whereas 
clear NCED3 induction by dehydration was observed in wild-type/ 
wild-type plants (Fig. 4c). Compared with cle25 #10/cle25 #10 plants, 
cle25 #10/wild-type plants exhibited enhanced NCED3 expression in 
dehydrated leaves, with expression levels reaching about 80% of those 
in wild-type/wild-type plants, which suggests that root-derived endog- 
enous CLE25 modulates NCED3 expression in leaves. Wild-type/cle25 
#10 plants exhibited dehydration-induced NCED3 expression in a sim- 
ilar manner to the cle25 #10/wild-type plants. This indicates that the 
CLE25 gene in the shoot, which is not induced in response to dehydra- 
tion (Fig. 2a), is also sufficient to induce an NCED3 expression level 
of about 80% of that of wild-type/wild-type plants. Plants obtained by 
the reciprocal grafting of wild-type or CLE25 RNAi (CLE25 RNAi #13) 
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shoot scions onto wild-type or CLE25 RNAi #13 rootstocks had similar 
NCED3 expression patterns in their leaves (Extended Data Fig. 7c, d). 
nLC-MS/MS analyses showed the accumulation of endogenous CLE25 
in dehydrated leaves of cle25 #10/wild-type plants (Extended Data 
Fig. 8). These results indicate that CLE25 contributes to long-distance 
cellular communication. 

To identify receptor-like kinases that can recognize CLE25 and mod- 
ulate NCED3 expression, several mutants of candidate receptor-like 
kinases related to CLE peptides—including CLV and BAM receptors— 
were selected. The bam1-5 bam3-3 mutant repressed NCED3 expression 
(Fig. 4d). The accumulation of ABA in response to dehydration stress was 
abolished in leaves of the bam1-5 bam3-3 mutant (Fig. 4e). The bam1-5 
bam3-3 mutant exhibited a marked sensitivity to dehydration stress 
(Fig. 4f). Salinity stress responses partially mediate signalling in common 
with dehydration stress responses. The cle25 and bam1-5 bam3-3 mutants 
also exhibited sensitivity to salinity stress (Extended Data Fig. 9). CLE25 
application to leaves did not increase NCED3 expression in the leaves of 
bam 1-5 bam3-3 mutants (Extended Data Fig. 10a). CLE25 application 
to roots did not increase NCED3 expression in leaves of grafted bam1-5 
bam3-3/bam1-5 bam3-3 and bam1-5 bam3-3/wild-type plants, although 
grafted wild-type/wild-type and wild-type/bam1-5 bam3-3 plants did 
exhibit enhanced NCED3 expression (Fig. 4g). Confocal sectional anal- 
ysis of cle25 and bam1-5 bam3-3 mutants suggested that endogenous 
CLE25, and BAM1 and BAM3, do not affect protoxylem and metax- 
ylem vessel formation and the vascular development of leaves and roots 
(Extended Data Fig. 10b-m). The application of CLE25 to roots increased 
NCED3 expression in leaves of bam2-5 bam3-3 mutants (Fig. 4h). These 
results suggest that the movement of CLE25 from roots to leaves modu- 
lates NCED3 expression in association with BAM1 and BAM3 receptors 
in leaves. By contrast, BAM1 and BAM2 mainly mediate leaf develop- 
ment”’. However, each receptor may mediate different functions under 
dehydration stress conditions. 

These findings demonstrate that the CLE25-BAM pair functions 
as one of the modules of long-distance signalling in response to dehy- 
dration stress. The application of CLE25 and CLE26 inhibits root 
growth but not protoxylem vessel formation, suggesting that CLE25- 
and CLE26-induced root growth inhibition is different from that of 
other CLE peptides!?”°4, This root growth inhibition was an ABA- 
independent effect (Extended Data Fig. 10n-s). By contrast, the appli- 
cation of CLE25 modulates dehydration responses in leaves, but the 
application of CLE26 does not (Fig. 1 and Extended Data Fig. 1a-f). 
Physiological functions of CLE25 may be controlled in different target 
tissues. Lotus japonicus CLE-root signal proteins propagate nitrate status 
signals from roots to shoots”®. Thus, peptide hormones can strongly 
coordinate information from the underground and aerial parts of plants. 
The CLE25-BAM module may transmit dehydration stress signals more 
precisely to specific tissues than does the ABA regulatory system. 
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METHODS 


Plant materials and growth conditions. Arabidopsis thaliana ecotype Col-0 was 
used as a wild-type control and as the genetic background of the transgenic lines. 
Wild-type, transgenic and mutant plants were grown on germination medium agar 
plates containing 1% sucrose under long-day conditions (16h light:8 h dark) at 
22 °C. nced3-2 was provided by K. Urano (RIKEN Center for Sustainable Resource 
Science). The cle46-1 (SALK_207109C), bam1-5 (SALK_152555), bam2-5 
(GK-791G02) and bam3-3 (SALK_118860) mutant seeds were obtained from the 
Arabidopsis Biological Resource Center (Columbus). A. thaliana T87 cells were 
cultured in 100 ml of Jouanneau and Péaud-Lenoél (JPL) medium” with gentle 
agitation (110 r.p.m.) under continuous illumination at 22 °C. A 1-ml aliquot of 
the cell suspension was transferred to fresh medium every 10 days. 

Synthetic CLE peptide treatment. All synthesized CLE peptides contained two 
hydroxyproline residues!’, The purity of each synthesized peptide was > 95%. At 
25 days after germination, seedlings of control and grafted plants were transferred 
to 3 ml of water for 16h. Water was then replaced with opening buffer (20mM KCI, 
1mM CaCl, and 5 mM 2-(N-morpholino)ethanesulfonic acid-KOH, pH 6.15) 
and incubated for 2h under light conditions (180-200 jtmol photons stm). 
Plant roots (as shown in Fig. 4a, b, g, h and Extended Data Figs. 1b, c, 7a, b) or 
detached leaves (as shown in Extended Data Fig. 10a) were then transferred to 
sterile water containing 1 \1M synthetic CLE peptides, 0.01% acetonitrile solution 
as mock treatment, 5 1M synthetic CLE peptides or 0.05% acetonitrile solution as 
mock treatment (as shown in Extended Data Fig. 1a) and incubated for different 
times as indicated in the figures. All rosette leaves were collected for gene expres- 
sion analysis by quantitative RT-PCR (as shown in Fig. 4a, b, g, h and Extended 
Data Figs. 1a, b, 7a, b, 10a). 

Measurement of stomatal apertures. Four-week-old soil-grown Arabidopsis plants 
or rosette leaves detached from three-week-old soil-grown Arabidopsis plants were 
placed on glass slides in opening buffer with abaxial sides facing up for 2h under 
light conditions (180-200 jmol photons s~' m~?) to open the stomata. Roots of 
whole plants or detached leaves were then transferred to sterile water containing 
1,.M ABA, 1 1M synthetic peptides or 0.01% acetonitrile solution as mock, and 
then incubated for the times indicated, to close the stomata. To analyse intact 
plant leaves, images of stomatal apertures were obtained using Suzuki’s Universal 
Micro-Printing (SUMP) method with SUMP liquid and SUMP plate C (SUMP 
Laboratory). SUMP images were observed using an AxioPlan 2 Microscope 
System (Carl Zeiss). Width and length of stomatal aperture were measured with 
PhotoRuler version 1.1.3. Stomatal opening was calculated by the width:length 
ratio. 

Purification of endogenous CLE25 peptides. On day 7 after subculture, cultured 
Arabidopsis T87 cells were subjected to stress treatment. Cell concentration was 
adjusted to 150-200 mg ml '. Stress was applied by adding JPL medium with or 
without 1 M mannitol for 4h. The final mannitol concentration was 0.4 M. After 
treatment, the liquid culture medium was filtered through a polyethersulfone 
membrane (0.1-|1m pore size). Peptides were purified from the culture medium 
by ion-exchange and reversed-phase chromatography, followed by solid-phase 
extraction. The dried medium was dissolved in 300 ml of water and subjected to 
solid-phase extraction (InertSep C18-B 60 mL; GL Sciences). After washing with 
aqueous 0.1% trifluoroacetic acid, the sample was eluted with 15% acetonitrile from 
the same solution. The eluate was concentrated by lyophilization and applied to 
an ion-exchange column (PolySULFOETHYL-A, 2.1 x 100 mm; Poly-LC), equili- 
brated with 20 mM sodium phosphate buffer (pH 3.0) at a flow rate of 0.1 ml min“! 
and eluted with a linear gradient of 3 ml of 0-0.5 M NaCl as the equilibration 
buffer. The fraction eluted at the retention time corresponding to the synthetic 
CLE25 peptide was collected, then purified further by reversed-phase chroma- 
tography with an Inertsil ODS-3 column (1 x 100mm; GL Sciences) and eluted 
with a 12.5-50% gradient of acetonitrile in aqueous 0.1% trifluoroacetic acid over 
a period of 60 min. 

Arabidopsis leaves that absorbed 5 1M non-labelled or isotope-labelled CLE25 
peptide from roots for 3h, or leaves of wild-type or cle25 #10 shoot scions grafted 
onto wild-type rootstocks under 3h of dehydration stress were used for peptide 
extraction. Peptides were fractionated with an Amicon Ultra 10K filter (Merck 
Millipore) and then purified using GL-Tip SDB and GC columns (GL Sciences). 
Mass spectrometry analysis. To analyse the endogenous CLE25 peptide from 
Arabidopsis T87 suspension-culture cells, the fraction collected at the retention 
time corresponding to the synthetic CLE25 peptide was subjected to nLC-MS/ 
MS using a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides 
were separated using a nano-ESI spray column (100-mm length x 75-j1m internal 
diameter, 3-\1m opening, NTCC analytical column C18; Nikkyo Technos), which 
was equilibrated with buffer A (0.1% aqueous formic acid) and eluted with a linear 
gradient of 30% buffer B (0.1% formic acid in 100% acetonitrile) over a 20-min 
period at a flow rate of 300 nl min“! (Easy nLC; Thermo Fisher Scientific). The 
mass spectrometer was operated in positive-ion mode and MS/MS spectra were 
acquired in targeted MS/MS mode (m/z= 459.24 and 688.36). 
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To detect CLE25 peptide, nLC-MS/MS analyses were conducted with the 
following instruments: an Autosampler-2 1D plus, NanoLC Ultra (Eksigent 
Technologies) and TripleTOF 5600 (SCIEX). An L-column ODS C-18 (5-mm 
length x 0.3-mm internal diameter, 5-\1m opening) was used as a sample trap 
and an L-column Micro C-18 (150-mm length x 75-\1m internal diameter, 5-j1m 
opening) was used to prepare peptide samples (Chemical Evaluation and Research 
Institute). The injection volume was 1 1 and the flow rate was 300 nl min” !. The 
mobile phases comprised 2% acetonitrile and 0.1% formic acid (A) and 80% ace- 
tonitrile and 0.1% formic acid (B). The linear gradient comprised A:B = 98:2 to 
A:B = 60:40 for 125 min, A:B = 10:90 for 5 min and A:B = 98:2 for 20 min. An ion 
spray voltage of 1400 V was applied via the metal connector with a Dream Spray 
closed-type nanospray source (AMR). The MS scan ranged from an m/z of 400- 
1250. The top 10 precursor ions were selected for subsequent MS/MS scans in the 
high-sensitivity mode. Peaks at m/z 688.34 [M +2H]** (non-labelled CLE25 pep- 
tide), 691.35 [M+ 2H]** (isotope-labelled CLE25 peptide) or 459.24 [M+3H]** 
(endogenous CLE25 peptide), and the product ions were detected by nLC-MS/MS. 

Extended Data Fig. 6c lists the top 10 proteins with the highest accumulation 
in T87 cells in response to mannitol treatment. Concentrations of these proteins 
in T87 cells in response to mannitol conditions indicate similar accumulations in 
T87 cells under control conditions. Columns of the accumulation in liquid culture 
medium indicate the amounts of these proteins in the liquid culture medium of 
T87 cells cultured with or without mannitol. Accumulations of these proteins in 
the liquid culture medium were similar under control conditions and in response 
to mannitol treatment. 

Plasmid construction. The vector pGreenII 0129, which contained the CaMV35S 
promoter and the PDK intron of pkKANNIBAL, was used to prepare the 
RNAi construct. A 215-bp fragment (5‘-ATGCTTGTTTTTTGCTTCCCATTT 
CGCTTTCCCCTTTTTTGAGCCTCTTCTGTCCAAAGATATCTCTCTCTA 
TTTATGTGACAGTCACTTCACCAACATCATGGATGTTCTGCTCAGTTT 
ATTCTTGGGTTTGGTTGGTCAGTTGTTTATGTTAAACAGGAAGCTGTAG 
GGACATAGGTTTCAGTATGGGTGGAAATGGCATTAGAGCTTTGGTT-3’) 
corresponding to the leader sequence and first exon of CLE25 was isolated via 
PCR with the incorporated restriction sites, which produced compatible ends. 
The CasOT algorithm was used to design suitable guide RNAs (gRNAs) without 
off-targets via the website ‘focas’ (http://focas.ayanel.com/)”””*. The designed 
18-bp gRNA (5’-GAAATGGCATTAGAGCTT-3’) was inserted into the Bsal 
restriction site in the CRISPR-Cas9 vector, pEgP526-2A-GFBSD2 plasmid or 
pEgP126_Paefl-2A-GFPSD2 plasmid”*. These plasmids were transformed into 
Agrobacterium tumefaciens strain GV3101 and then introduced into Arabidopsis 
using the floral dipping method. 

Analysis of mutations at CRISPR-Cas9 target sites. Genomic DNA was extracted 
from each of the CRISPR-Cas9 transgenic plants and wild-type plants. A CEL-1 
assay was performed with a Guide-it Mutation Detection Kit (Takara Bio) using the 
300-bp PCR products surrounding the target locus of the gRNA from the selected 
T1 plants containing pEgP526-2A-GFBSD2 or wild-type plants, with PrimeSTAR 
GXL DNA polymerase (Takara Bio). After confirming the presence of the mutation 
induced by pEgP526-2A-GFBSD2, the gRNA was introduced into pEgP126_Paef]- 
2A-GFPSD2 to obtain an early generation of bi-allelic mutants”*. Sequences of 
PCR products and sub-clones obtained using DNA from T2 plants were analysed 
to determine segregation of mutants in T2 and T3 plants, and T3 homozygous 
mutants were isolated for further analysis. 

Quantitative RT-PCR analysis. Total RNA was isolated using a Trizol- 
modified reagent. First-strand cDNA was synthesized from 5 ig of total 
RNA using random hexamer primers and SuperScript II] reverse tran- 
scriptase (Invitrogen Corporation). Quantitative RT-PCR was performed 
with gene-specific primers according to instructions provided with Primer 
Express Software version 3.0.1 (Life Technologies Corporation) and SYBR 
Premix Ex Taq (Takara Bio) and analysed using a 7500 Fast Real-Time 
PCR system (Applied Biosystems) with the following gene-specific primer 
sets: CLE25 forward, 5’-GGTAAGGATGTGAATCTGTTTCATGT-3’; 
CLE25 reverse, 5'-TCTGCTTTCCTGTTGTGGATAGG-3’; NCED3 
forward, 5‘-CACGATTTCGCGATTACAGAGA-3’; NCED3 reverse, 
5'-CCGGCAGCTTGAAAACGA-3'; LEA forward, 5‘-GCAAAA 
CGCGAGCTACCAA-3’; LEA reverse, 5'-GTCCAGTCTGTTGC 
AAGGAGTCT-3’; RD29B forward, 5’-GCGCACCAGTGTATGAATCCT-3’; 
RD29B reverse, 5‘-CGGCATGACTAAGAGACTTAGGTTT-3‘; and Actin2 
forward, 5‘/-AGTGGTCGTACAACCGGTATTGT-3’; Actin2 reverse, 
5’-GATGGCATGAGGAAGAGAGAAAC-3’, 

Histochemical analysis of GUS expression in transgenic plants. Transgenic 
Arabidopsis plants harbouring the GUS reporter gene fused to the 1.19-kb 
Arabidopsis CLE25 promoter were used for histochemical GUS assays. To ana- 
lyse CLE25 expression, 4-, 9-, 12- or 16-day-old seedlings were stained with a 
GUS staining buffer (100 mM Tris-HCl, pH 7.0, 2mM ferricyanide and 1mM 
5-bromo-4-chloro-3-indolyl-$-d-glucuronidase) for 16h at 37 °C. GUS expression 
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profiles were then determined using a microscope. The images represent 9-day- 
old seedlings. The roots of transgenic plants were fixed in a 1:3 mixture of acetic 
acid:ethanol and mounted in a mixture of chloral hydrate:glycerol:water (8:1:2). 
The fixed roots were embedded in Technovit 7100 (Heraeus Kulzer) after the series 
of ethanol replacement for generating cross sections. Sections were made at 5-jum 
thickness by Microtome (HM335E; MICROM GmbH). 

In situ hybridization analysis. Arabidopsis tissues were fixed with 10% neutral- 
buffered formalin (NBF) + 50% ethanol solution, embedded in paraffin on 
CT-Pro20 using G-Nox as a less-toxic organic solvent than xylene and sectioned 
at a thickness of 5 jm. In situ hybridization was performed with an ISH Reagent 
Kit (GenoStaff) according to the manufacturer's instructions. Tissue sections 
were deparaffinized with G-Nox and rehydrated using an ethanol series and 
phosphate-buffered saline (PBS). The sections were fixed with 10% formalin 
in PBS for 15 min at room temperature, washed in PBS, treated with 3 pg ml"! 
of proteinase K (Wako Pure Chemical Industries) in PBS for 10 min at 37 °C, 
washed in PBS, re-fixed with 10% NBF for 15 min at room temperature, washed 
in PBS, placed in 0.2 N HCI for 10 min at room temperature, washed in PBS 
and placed within a Coplin jar containing 1 x G-Wash (GenoStaff), which was 
equal to 1 x saline sodium citrate. Hybridization was performed with probes 
at concentrations of 300 ng ml“! in G-Hybo-L (GenoStaff) for 16h at 60 °C 
and then with 50% formamide in 1 x G-Wash for 10 min at 60 °C. The sections 
were washed twice in 1 x G-Wash for 10 min at 60 °C, twice in 0.1 x G-Wash 
for 10 min at 60 °C and finally twice in Tris-buffered saline with 0.1% Tween 
20 (TBST) at room temperature. After treatment with 1 x G-Block (GenoStaff) 
for 15 min at room temperature, the sections were incubated with an anti-DIG 
AP conjugate (Roche Diagnostics K.K.) diluted to 1:2000 with 50 x G-Block 
(GenoStaff) in TBST for 1h at room temperature. The sections were washed 
twice in TBST and then incubated in 100 mM NaCl, 50mM MgCh, 0.1% Tween 
20 and 100 mM Tris-HCl (pH 9.5). Colouring reactions were performed over- 
night with an NBT/BCIP solution (Sigma-Aldrich) and then washed with PBS. 
The sections were counterstained with Kernechtrot stain solution (Muto Pure 
Chemicals) and mounted with G-Mount (GenoStaff). 

Dehydration stress treatments. To induce dehydration stress, 18-day-old seed- 
lings of each transgenic plant or 25-day-old seedling of each grafted plant were 
transferred into 3 ml of water for 16h. Plants were removed from water and then 
incubated for times indicated under long-day conditions (16h light:8h dark) at 
22-25 °C in 45-60 relative humidity to induce dehydration stress. A minimum 
of four seedlings were used for each experimental condition. All rosette leaves 
(Figs. 2a, 4c and Extended Data Fig. 1a, 7d), all roots (Fig. 2a and Extended Data 
Fig. 7c) and all whole seedlings (Figs. 3a—c, 4d and Extended Data Fig. 2c-e, 5a—d) 
were collected for use in other experiments. 

The amount of water loss of wild-type and mutant plants was calculated by 
weighing each plant at the times indicated. All changes in fresh weight are pre- 
sented as percentages in Fig. 3e. 

To analyse dehydration stress-sensitive phenotypes, wild-type, transgenic and 
mutant plants were grown in soil at 22 °C for 4 weeks under long-day conditions 
(illumination at 40-60 jmol photons s~! m~?). Water content ratio of each pot 
was adjusted to 59.1% (Daio Kasei, professional grove soil, 45 g; water, 65 g), and 
then the water supply was stopped. Each pot was rotated every half a day while 
water was withheld. After 14-16 days of withholding water, the water supply was 
restarted. Three images of the same plants before dehydration, after dehydration 
and rehydration after dehydration are shown. Survival rates of each genotype were 
measured during rehydration after dehydration. 

Measurement of ABA levels. To quantify ABA levels, dehydration stress was 
induced in detached roots or whole seedlings by incubation for the times indicated 
in the figures. All rosette leaves were collected from whole seedlings to measure 
ABA content in leaves only. All samples were ground in liquid nitrogen. ABA was 
extracted with 80% methanol, 500 mg1! of citric acid and 10 mg]! of butylated 
hydroxytoluene. After centrifugation to remove any debris, the supernatant was 
dried and reconstituted with Tris-buffered saline (25 mM Tris, 100 mM NaCl, 
1mM MgCh, pH 7.5). ABA was measured with a Phytodetek ABA measurement 
kit (Agdia) according to the manufacturer’s instructions. 

Micrografting. Wild-type, transgenic and mutant plant seeds were sown in half- 
strength MS medium containing 0.7% agar and 0% sucrose with cellulose nitrate 
membranes under long-day conditions (10-20 jmol photons s~! m~?) at 22 °C. 
Hypocotyls were cut from 5-day-old seedlings of each genotype using a syringe 
needle. Shoot scions from wild-type, transgenic or mutant plants were recipro- 
cally grafted onto rootstocks of wild-type, transgenic or mutant plants. Grafted 
plants were incubated in a half-strength MS medium containing 1.5% agar and 
0% sucrose under long-day conditions (10-20 j1mol photons s~! m~?) at 26 °C 
for 4 days and then incubated at 22 °C for 2 days. After growing in half-strength 
MS medium with 0.8% agar and 1% sucrose for 4 days, plants were transferred 
to charcoboll soil (IMPACK) and grown for 10 days under long-day conditions 
(illumination at 40-60 {mol photons s~! m~?). 


Chlorophyll measurement. Seeds of wild-type, cle25 mutant or bam1-5 bam3-3 
mutant plants were germinated and grown on germination medium agar plates 
containing different concentrations of NaCl (0, 130, 140 or 150 mM) for 16 days. 
Seedlings were collected and ground in liquid nitrogen. Chlorophyll was extracted 
with 80% acetone. Absorbance was measured at 646.6 and 663.6 nm using an 
EnSpire multimode plate reader (PerkinElmer, Waltham, MA, USA). Chlorophyll 
content was determined as: chlorophyll a+ chlorophyll b= 17.76 x Agae.6 + 7.34 
x Aoo3.6. The chlorophyll content of wild-type plants grown under 0-mM-NaCl 
conditions was used to normalize the chlorophyll content of wild-type plants, cle25 
mutants or bam1-5 bam3-3 mutants grown with or without NaCl. 

Gas exchange measurements. Stomatal conductance was assayed in seedlings of 
25-day-old wild-type plants and cle25 mutants using a portable gas exchange sys- 
tem (LI-6400; LI-COR). The illumination was set at 60 j»mol photons s !m?, the 
air flow was set to 500 ,umol s~! and the CO; concentration of the air was controlled 
at 400 p.p.m. using a CO; cylinder during experiments. 

Observation of vasculature. Cotyledons and roots were fixed in a 1:3 mixture of 
acetic acid:ethanol and mounted in a mixture of chloral hydrate:glycerol:water 
(8:1:2). Vascular images were obtained with a light microscope (BX51; Olympus). 
For generating cross sections, fixed roots were embedded in Technovit 7100 
(Heraeus Kulzer) after the series of ethanol replacement. Sections were made at 
3-1m thickness by Ultramicrotome (RM2165; Leica). The sections were stained 
with 0.01% toluidine blue O and observed under a light microscope (BX51; 
Olympus). 

Statistical analyses and reproducibility. All statistical tests and n numbers, includ- 
ing sample sizes or biological replications, are described in the figure legends. 
Central lines indicate median and variation indicates interquartile range in box- 
and-whisker plots. Central lines indicate median in dot plots. For comparison 
between two groups, two-tailed Student's t-test was used. For comparison between 
more than two groups, one-way ANOVA was used, and followed by a Tukey’s or a 
Tukey-Kramer post hoc test. For assessment between two independent variables, 
two-way ANOVA was used, followed by a Tukey’s post hoc test. We could not pro- 
vide the exact P values because a one-way ANOVA was used, followed by a Tukey’s 
or Tukey-Kramer post hoc test for comparison between more than two groups. 
The f-test analyses were performed with an alpha level of 0.01 or 0.05, and provided 
with the following t-values (t) and degrees of freedom (d.f.). For Extended Data 
Fig. 1a, dehydration, t= 17.36, d.f.=6, P=0.00000021; CLE25, t= 0.38, d.f.=6, 
P=0.72; CLV3, t= 1.75, d.f.=6, P=0.13; CLE46, t= 1.64, d.f.=6, P=0.15; and 
TDIE, t= 1.13, d.f.=6, P=0.30. For Extended Data Fig. 7d, wild-type/CLE25 
RNAi #13 2h, f=1.17, d.f.= 10, P=0.27; wild-type/CLE25 RNAi #13 5h, t=1.21, 
d.f.=10, P=0.25; and CLE25 RNAi #13/wild-type 5h, t=0.21, d.f.= 10, P=0.84. 
For Extended Data Fig. 9, cle25 #10 130 mM, t= 3.84, d.f=4, P=0.018; bam1-5 
bam3-3 130mM, t= 4.02, d.f.=4, P=0.016; cle25 #10 140mM, t=3.88, d.f.=4, 
P=0.018; bam1-5 bam3-3 140 mM, t=5.70, d.f.=4, P=0.0026; cle25 #6 150 mM, 
t=4.74, df.=4, P=0.0090; cle25 #10 150 mM, t= 8.02, d.f.=4, P=0.0013; and 
bam1-5 bam3-3 150mM, t= 10.92, d.f.=4, P=0.0004. 

ANOVA analyses were performed with an alpha level of 0.01 or 0.05, and pro- 
vided with the following F values (F) and degrees of freedom (d.f.). For Fig. la, 
F=1,605.09, d.f.=9. For Fig. 1b, F= 1,845.46, d.f.=5. For Fig. 2a, F= 45.35, 
d.f.=5. For Fig. 3a, F= 114.24, d.f.=14. For Fig. 3b, F= 559.38, d.f.= 14. 
For Fig. 3c, F= 148.82, d.f.= 14. For Fig. 3d: leaves, F= 53.141, d.f. = 8; roots, 
F=3.29, d.f.=8. For Fig. 3e: 5 min, F=5.16, d.f.=2; 10min, F=6.41, d.f.=2; 
15min, F= 23.58, d.f.=2; 20 min, F=29.14, d.f.=2; 25min, F= 32.11, df.=2; 
30 min, F= 41.65, d.f.=2; 35min, F= 33.53, d.f.=2; 40 min, F= 38.12, d.f.=2; 
45 min, F= 32.37, d.f.=2; 50 min, F=35.26, d.f.=2; 55min, F=27.96, d.f.=2, 
60 min, F= 26.48, d.f. =2. For Fig. 4a, F= 418.60, d.f.=5. For Fig. 4b, F= 283.88, 
d.f.=8. For Fig. 4c, F=913.17, d.f.=7. For Fig. 4d, F= 513.85, d.f.=11. For 
Fig. 4e: leaves, F= 34.91, d.f.=5; roots, F= 8.38, d.f.=5. For Fig. 4g, F=576.44, 
d.f.=11. For Fig. 4h, F= 99.98, d.f.=5. For Extended Data Fig. 1b, F= 50.17, 
d.f.=5. For Extended Data Fig. 1c, F= 23.98, d.f.=5. For Extended Data Fig. 1f, 
F=1,713.73, d.f.=6. For Extended Data Fig. 2c, F= 190.26, d.f.=9. For Extended 
Data Fig. 2d, F=529.81, d.f.=9. For Extended Data Fig. 2e, F= 61.53, d.f.=9. For 
Extended Data Fig. 3, F= 3,302.34, d.f.=8. For Extended Data Fig. 4a, F= 17.78, 
d.f.=19. For Extended Data Fig. 4b, F= 2.52, d.f.=2. For Extended Data Fig. 4c, 
F=0.092, d.f.=2. For Extended Data Fig. 4d, F=0.69, d.f.=2. For Extended Data 
Fig. 5a, F= 159.01, d.f.= 14. For Extended Data Fig. 5b, F= 1,430.26, d.f.= 14. For 
Extended Data Fig. 5c, F= 1,829.29, d.f.= 14. For Extended Data Fig. 5d, F= 1,405, 
d.f. = 14. For Extended Data Fig. 7a, F= 107.24, d.f. =5. For Extended Data Fig. 7b, 
F=1,959.43, d.f.=5. For Extended Data Fig. 7c, F= 70.70, d.f.= 19. For Extended 
Data Fig. 10a, F= 52.66, d.f.=5. For Extended Data Fig. 10n, genotypes, F= 3.29, 
d.f.=2. For Extended Data Fig. 10p, genotypes, F= 0.35, d.f.=2. For Extended 
Data Fig. 10r, genotypes, F= 5.89, d.f.=2. 

Statistical methods were used to predetermine sample size. For the samples 
extracted from the infinite population, such as in the analysis of stomatal aper- 
ture, we calculated and determined the sample size with Excel and Visual Basic 
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for Applications. The parametric test is used for other statistical analyses because 
the data indicate that the population is normally distributed and the population 
variance is equal. It is generally thought that the sample size is guaranteed with 
n> 6 in those tests. For these reasons, we determined the sample size of key data 
as n> 6 in our analysis. All samples were allocated randomly into experimental 
groups, and all experiments were blinded during data acquisition and analyses. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Source data for Figs. 1-4 and Extended Data Figs. 1-5, 7, 9, 
10 are provided with the paper. The raw image of electrophorese are provided 
in Supplementary Fig. 1. Sequence data used in this paper can be found in The 
Arabidopsis Information Resource (TAIR) database (https://www.arabidopsis.org/) 


LETTER 


under the following accessions: At3g28455 (CLE25), At3g14440 (NCED3), 
At3g02480 (LEA), At5g52300 (RD29B) and At3g18780 (ACT2).Other data that 
support the findings of this study are available from the corresponding authors 
upon request. 
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Extended Data Fig. 1 | Effects of synthetic CLE peptide application 
on NCED3 expression and stomatal closure in leaves, and movement 
of CLE25 peptide from roots to leaves. a, NCED3 expression after 
application of 5 1M of each synthetic CLE peptide to roots for 3h, or 
in response to dehydration stress for 3h in leaves of wild-type plants 
(n=4 biological replicates). **P < 0.01, no significant difference (NS) 
among treatment conditions as analysed by two-tailed Student's t-test 
(see Methods for exact P values). b, NCED3 expression in leaves of 
wild-type plants after application of 1 1M peptide to roots for 3h (n=6 
biological replicates). c, ABA content in leaves after application of 1 
\M peptide to roots for 3h (n=8 biological replicates). d, Comparison 
of peptide sequences of CLE25 and CLE26. e, Typical images of wild- 
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type (n= 6 biological replicates), 1 .M-ABA- or 1 p»M-CLE25-induced 
stomatal closure (n = 4 and 6 biological replicates, respectively). Scale 
bars, 20 tum. f, Roots of whole plants were incubated without (0h on 

x axis, n = 547) or with 0.01% acetonitrile (mock, n=519), ABA (n=562) 
or each CLE peptide (n = 546, CLE26; n = 578, CLV3; n = 762, CLE46; 
n=561, TDIF) for 3h. Data are from three experiments. **P < 0.01, 
***P < 0.001 as analysed by one-way ANOVA followed by a Tukey’s 

(b, c) or a Tukey—Kramer (f) post hoc test. g, h, Detection of non-labelled 
(g) and isotope-labelled (h) CLE25 peptide by nLC-MS/MS. These 
experiments were repeated two times independently with similar results. 
Hyp, hydroxyproline. 
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Extended Data Fig. 2 | The cle25 mutants are generated using CRISPR- 
Cas9 method, and clv3-8, cle46-1 and tdr-1 mutants do not exhibit 
repression of NCED3 expression in response to dehydration stress. 

a, CEL-I analysis of T1 cle25 mutants. The CLE25 locus was amplified 

in wild-type and cle25 mutants, then digested with CEL-I. The asterisk 
indicates mutated bands digested with CEL-I nuclease. These experiments 
were repeated four times independently with similar results. For gel 
source data, see Supplementary Fig. 1. b, CRISPR-Cas9-induced mutation 
detected by amplicon sequencing in T3 plants. The exons (boxes) and 
intron (line) indicate the schematic arrangement of the CLE25 gene. 
Selected target sequences (18 base pairs) are shown in red boxes and 
protospacer adjacent motif sequences are shown as green characters. 
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A base deletion (guanosine at position 22) was detected in the genomic 
DNA of the mutants. The red triangle shows the position of the base 
deletion site. This mutation created a stop codon after the mutation 

site (asterisk in the amino acid sequence). (c-e, Dehydration-induced 
NCED3 expression in clv3-8 (c), cle46-1 (d) and tdr-1 (e) mutants was not 
repressed compared with that in wild-type (Cont.) plants in response to 
dehydration stress (n =3 pooled biological replicates). CLV3, CLE46 and 
TDIF peptides do not have a primary function in the dehydration stress 
response that mediates ABA signalling. *P < 0.05, **P < 0.01 as analysed 
by one-way ANOVA followed by a Tukey’s post hoc test ((c-e). The clv3-8 
mutant was a point mutant of CLV3. The cle46-1 and tdr-1 mutants were 
transfer DNA mutants of CLE46 and TDIF RECEPTOR, respectively. 
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Extended Data Fig. 3 | nced3-2 and aba2-1 mutants treated with 

CLE25 peptide do not exhibit stomatal closure. Detached rosette 

leaves were incubated without (labelled ‘0’ on x axis: n = 505, wild type 

(Cont.); n = 647, nced3-2; n= 564, aba2-1) or with ABA (n= 617, wild 

type; n = 591, nced3-2; n = 467, aba2-1) or the CLE25 peptide (n = 505, 

wild type; n =517, nced3-2; n = 570, aba2-1) for 3h. Data are from three 

experiments. ***P < 0.001 as analysed by one-way ANOVA followed by a 

Tukey-Kramer post hoc test. 
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Extended Data Fig. 4 | Stomatal conductance, rosette diameter, fresh biological replicates) grown on soil was scored. c, Fresh weight of wild- 
weight and dry weight of wild-type and cle25 mutants grown on soil type plants and cle25 mutants (n = 6 biological replicates) grown on soil 
under control conditions. a, The stomatal conductance of wild-type was measured. d, Dry weight of wild-type plants and cle25 mutants (n = 6 
(Cont.) plants, and cle25 and nced3-2 mutants (n = 6 biological replicate) biological replicates) grown on soil was measured. ** P < 0.01 as analysed 


was measured. Data are plotted at 5-min intervals for 20 min under control by one-way ANOVA followed by a Tukey’s post hoc test (a-d). 
conditions. b, Rosette size of wild-type plants and cle25 mutants (n= 6 
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Extended Data Fig. 5 | Repression of CLE25 in transgenic plants 

affects expression of dehydration-induced genes and hypersensitivity to 
dehydration stress. a-~d, Dehydration-induced CLE25 (a), NCED3 (b), 
LEA (c) and RD29B (d) expression in wild-type (Cont.) and CLE25 RNAi 
plants in response to dehydration stress (n = 6 biological replicates). 

**P < (0.01 as analysed by one-way ANOVA followed by a Tukey’s post 
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hoc test (a-d). e, CLE25 RNAi plants and the nced3-2 mutant had a 
dehydration stress-sensitive phenotype (plants per group; n= 85, wild type 
(Cont.); n= 85, nced3-2; n= 85, CLE25 RNAi #10; n= 60, CLE25 RNAi 
#12; n= 60, CLE25 RNAi #13). Data are from three experiments. Scale 
bars, 2cm. 
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Extended Data Fig. 6 | Endogenous CLE25 peptide is secreted was detected only in di-hydroxy form. These experiments were repeated 
extracellularly. Arabidopsis T87 cells were cultured with or without two times independently with similar results (a, b). Hyp, hydroxyproline. 
0.4M mannitol for 4h. Then, peptides in the liquid culture medium c, List indicates top 10 proteins in T87 cells in response to mannitol 
were purified and analysed by nLC-MS/MS. a, Selected MS/MS ion treatment. Amounts of these top 10 proteins accumulated in the liquid 
chromatograms of the y4-ion from triply charged CLE25 peptides treated culture medium were the same under control conditions and in response 
with 0.4M mannitol. b, MS/MS spectra of endogenous (upper, with 0.4M to mannitol treatment. Cell lysis did not occur in response to mannitol 
mannitol treatment) and synthetic (lower) CLE25 peptide obtained by treatment. 


nLC-MS/MS. Endogenous CLE peptide with 0.4M mannitol treatment 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


i) 
tom 


CLE25 expression (relative) 
NCED3 expression (relative) 


0 3 3 (hr) 0 3 3 (hr) 
MOCK CLE25 MOCK CLE25 
O Cont. OO Cont. 
0 CLE25 RNAi #13 0 CLE25 RNAi #13 


=] 2) 16] | e. ic 


CLE25 expression (relative) 


0 0.5 1 2 5 
Dehydration (hr) 
(scion) Cont. oO CLE25 RNAi#13 oO Cont. oO CLE25 RNAi#13 
(rootstock) Cont. CLE25 RNAi#13 CLE25 RNAi#13 Cont. 


NCED3 expression (relative) 


0 0.5 1 2 5 
Dehydration (hr) 
(scion) oO Cont. oO CLE25 RNAi#13 oO Cont. oO CLE25 RNAi#13 
(rootstock) Cont. CLE25 RNAi#13 CLE25 RNAi#13 Cont. 

Extended Data Fig. 7 | CLE25 peptide moves from roots to leaves *P < 0.05, **P< 0.01 as analysed by one-way ANOVA followed by a 
and modulates NCED3 expression in leaves according to grafting Tukey’s post hoc test (a—c). d, Dehydration-induced NCED3 expression 
experiments. a, b, CLE25 expression (a) and NCED3 expression (b) after in grafted leaves in which shoots and roots were grafted between wild- 
application of CLE25 peptide to roots in leaves of wild-type (Cont.) and type and CLE25 RNAi plants (n =6 biological replicates). No significant 


CLE25 RNAi plants (n = 6 biological replicates). c, Dehydration-induced difference (NS) among the three genotypes as analysed by two-tailed 
CLE25 expression in grafted plants in which shoots and roots were grafted — Student’s t-test (see Methods for exact P values). 
between wild-type and CLE25 RNAi plants (n = 6 biological replicates). 
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Extended Data Fig. 8 | Root-derived endogenous CLE25 peptide 
accumulates in dehydrated leaves. Accumulation of CLE25 peptide 

in leaves of wild-type and cle25-mutant shoot scions grafted onto 
wild-type rootstocks was analysed by nLC-MS/MS. a, b, MS/MS ion 
chromatograms of triple-charged CLE25 peptides in dehydrated leaves of 
grafted wild-type/wild-type (a) or cle25 #10/wild-type (b) plants under 


3-h dehydration-stress conditions. c, MS/MS spectra of endogenous 
CLE25 peptide in leaves of grafted cle25 #10/wild-type plants under 3-h 
dehydration-stress conditions. d, MS/MS spectra of synthetic CLE25 
peptide. These experiments were repeated two times independently with 
similar results (a-d). Hyp, hydroxyproline. 
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Extended Data Fig. 9 | cle25 and bam1-5 bam3-3 mutants show treatment with different NaCl concentrations were shown for 16-day-old 
salinity stress-sensitive phenotype. a, Images represent 16-day-old seedlings after germination (n= 3 pooled biological replicates). *P < 0.05, 
seedlings for each genotype grown on germination medium agar plates **P < (0.01 as analysed by two-tailed Student’s t-test (see Methods for exact 


containing 0 mM or 150 mM NaCl (n = 3 biological replicates). Scale bars, P values). 
0.5cm. b, Measurements of relative chlorophyll contents in response to 
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Extended Data Fig. 10 | See next page for caption. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Fig. 10 | NCED3 expression in CLE25-treated leaves of 
bam1-5 bam3-3 mutants, vascular development of wild-type, cle25 and 
bam1-5 bam3-3 mutants under control conditions and root growth 
phenotypes of cle25, nced3-2 and aba2-1 mutants under control 
conditions or long-term application of CLE25 peptide. a, NCED3 
expression in the leaves of wild-type (Cont.) and bam1-5 bam3-3 mutants 
(n=6 biological replicates) after application of CLE25 peptide to leaves. 
**P <0.01 as analysed by one-way ANOVA followed by a Tukey’s post 
hoc test. b-e, Microscopy images of the leaf vasculature of wild-type 

(b; n=6 biological replicates), cle25 #6 (c; n= 12 biological replicates), 
cle25 #10 (d; n= 12 biological replicates) and bam1-5 bam3-3 mutants 

(e; n= 12 biological replicates). Scale bars, 1 mm. f-i, Microscopy images 
of the protoxylem and metaxylem vessel formation of wild-type (f n=4 
biological replicates), cle25 #6 (g; n= 3 biological replicates), cle25 #10 

(h; n= 4 biological replicates) and bam1-5 bam3-3 mutants (i; n= 4 
biological replicates). Scale bars, 50 1m. j-m, Cross section of primary 
root of wild-type (j; n = 9 biological replicates), cle25 #6 (k; n= 10 
biological replicates), cle25 #10 (1; n= 10 biological replicates) and bam1-5 
bam3-3 mutants (m; n = 8 biological replicates). Scale bars, 20 jum. 


n, Root length of wild-type and cle25 mutants from six to eleven days 
after germination, on germination medium agar plates (n = 16 biological 
replicates). 0, Images represent 8-day-old or 10-day-old seedlings after 
germination of each genotype on germination medium agar plates 

(n=4 biological replicates). Scale bars, 2 cm. p, Relative root length of 
wild-type, and nced3-2 and aba2-1 mutants from seven to eleven days 
after germination, on germination medium agar plate (n = 12 biological 
replicates). q, Images represent 8-day-old or 10-day-old seedlings after 
germination of each genotype on germination medium agar plates (n =3 
biological replicates). Scale bars, 2 cm. r, Relative root length of wild- 
type, and nced3-2 and aba2-1 mutants from seven to eleven days after 
germination, on germination medium agar plates containing 1 j.M CLE25 
peptide (n = 12 biological replicates). s, Images represent 8-day-old or 
10-day-old seedlings after germination of each genotype on germination 
medium agar plates containing 1 1M CLE25 peptide (n = 3 biological 
replicates). Scale bars, 2cm. Two-way ANOVA followed by a Tukey’s post 
hoc test indicated that there were no differences among each genotype 
(n, p, r). 
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Cardiac tissues generated from human induced pluripotent stem 
cells (iPSCs) can serve as platforms for patient-specific studies of 
physiology and disease!~°. However, the predictive power of these 
models is presently limited by the immature state of the cells»””*. 
Here we show that this fundamental limitation can be overcome 
if cardiac tissues are formed from early-stage iPSC-derived 
cardiomyocytes soon after the initiation of spontaneous contractions 
and are subjected to physical conditioning with increasing intensity 
over time. After only four weeks of culture, for all iPSC lines 
studied, such tissues displayed adult-like gene expression profiles, 
remarkably organized ultrastructure, physiological sarcomere 
length (2.2 um) and density of mitochondria (30%), the presence of 
transverse tubules, oxidative metabolism, a positive force-frequency 
relationship and functional calcium handling. Electromechanical 
properties developed more slowly and did not achieve the stage of 
maturity seen in adult human myocardium. Tissue maturity was 
necessary for achieving physiological responses to isoproterenol and 
recapitulating pathological hypertrophy, supporting the utility of 
this tissue model for studies of cardiac development and disease. 

Even the best available methods have limited ability to emulate 
the physiology of adult myocardium!"”?; excitation-contraction 
coupling (requiring transverse tubules (T-tubules)), positive force- 
frequency relationship (requiring mature calcium handling) and effi- 
cient energy conversion (requiring oxidative metabolism) are notably 
absent”?! Adult ventricular myocytes are uniquely organized for 
beating function, having densely packed sarcomeres, mitochondria, 
transverse tubules and sarcoplasmic or endoplasmic reticulum (SR/ER). 
Their mitochondria are positioned adjacent to sarcomeres and 
calcium pumps to enhance ATP diffusion; the sarcoplasmic reticulum 
provides fast delivery of stored calcium ions to contractile proteins; and 
the T-tubules synchronize heartbeats by concentrating L-type calcium 
channels, which are positioned close to the ryanodine receptors that 
release calcium ions from the SR/ER"’. This highly specialized machin- 
ery for excitation—-contraction coupling is not present in the fetal heart, 
but emerges after birth!* with the switch from glycolytic to oxidative 
metabolism that supports the energy demands of the postnatal heart’. 

Human iPSC-derived cardiomyocytes (hiPS-CMs) can be matured 
by long-term culture and electrical, hydrodynamic and mechanical 
stimulation®!7!-!8, Recent studies have indicated that this in vitro 
maturation may not follow the in vivo developmental paradigm; high 
stimulation frequencies benefit maturation in vitro’, whereas the 
native heart beats more slowly following birth>'*. We investigated 
the reasons why current strategies fail to develop the characteristics 
of adult myocardium. Because the responsiveness of hiPS-CMs to 
physical stimuli declines as differentiation progresses, we suggested 
that electromechanical conditioning should be initiated early, during 
the period of high cell plasticity. As the heart matures in response to 
energy demands, we further hypothesized that increasing the intensity 
of induced contractions would enhance the development of mature 
ultrastructure and function. 


1,5 


To test these hypotheses, we studied the maturation of human cardiac 
tissues grown from early-stage hiPS-CMs (day 12, immediately follow- 
ing the first spontaneous contractions) or late-stage hiPS-CMs (day 28, 
matured in culture). Cardiac tissues were assembled in a modular tissue 
platform that enabled individual control of the culture environment 
and physical signalling. hiPS-CMs (derived from three donors) and 
supporting fibroblasts were incorporated into fibrin hydrogel stretched 
between two flexible pillars (designed to provide mechanical forces 
similar to those in native myocardium) and subjected to electrical stim- 
ulation to induce auxotonic contractions. Three conditioning regimes 
were applied: (i) control (no stimulation); (ii) constant (three weeks at 
2Hz); and (iii) intensity training (two weeks at a frequency increasing 
from 2 Hz to 6 Hz by 0.33 Hz per day, followed by one week at 2 Hz. 
The resulting tissues were 6mm long and 1.8mm in diameter, and 
were evaluated in real time (for contractile and conductive behaviour 
and calcium handling) and by end-point assays (for gene expression, 
proteins and ultrastructure), using human fetal cardiac tissues (FCTs) 
and adult human heart ventricles as benchmarks (Fig. 1a, Extended 
Data Fig. la—e). Intensity-trained tissues grown from early-stage 
hiPS-CMs (hereafter early-stage intensity-trained) exhibited compact 
and well-differentiated cardiac muscle (Extended Data Fig. 1f-p) and 
marked changes in the expression of genes associated with adult-like 
conduction (increased ITPR3, KCNH2, decreased HCN4), maturation 
(increased NPPB, MAPK1, PRKACA), ultrastructure (increased MYH7, 
GJA1, TNNI3, AKAP6, GJA5, JPH2), energetics (increased AKAP1, 
TFAM, PPARGCIA) and calcium handling (increased CAV3, BIN1, 
ATP2A2, RYR2, ITPR3). The other early-stage-derived tissues, all late- 
stage-derived tissues and FCTs displayed immature cardiac phenotypes 
(Fig. 1b, Extended Data Fig. 2a, b). 

Seeding with early-stage hiPS-CMs was critical for the response of 
the mature tissues to physical signals. Only the early-stage intensity- 
trained tissues displayed orderly signal propagation and anisotropic 
gap junctions. Among all tested groups, early-stage intensity-trained 
tissues had electrophysiological properties that were comparable to 
Biowires’, including the shape of the action potential with its char- 
acteristic notch, the resting membrane potential of —70.0 +2.7 mV, 
the Ix; current (peak inward density of -9.9+3.8 pA pF! and 
peak outward density of 0.30+0.12 pA pF!) and the conduction 
velocity (25.0 +0.9cm s!) (Fig. 1c, d, Extended Data Figs. 2, 3a-f, 
Supplementary Videos 1, 2). 

Early-stage intensity-trained tissues also exhibited a positive force- 
frequency relationship (FFR), a hallmark of maturation not seen in 
other in vitro myocardial tissue models™®. The generated forces mark- 
edly exceeded those in all other tested groups and FCT (Fig. 1f), but 
remained below those in adult myocardium’? (44 mN mm”). Directly 
measured forces and contraction amplitudes increased approximately 
twofold over the range of stimulation frequencies (1-6 Hz) during 
the maturation of early-stage intensity-trained tissues, indicating the 
maturation of contractile behaviour. These tissues acquired regular 
contraction profiles, in contrast to late-stage intensity-trained tissues 
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Fig. 1 | Intensity training of cardiac tissues derived from early-stage 
hiPS-CMs enhances maturation. a, Experimental design: early-stage 

or late-stage hiPS-CMs and supporting fibroblasts were encapsulated 

in fibrin hydrogel to form tissues stretched between two elastic pillars 
and made to contract by electrical stimulation. Gradual increase in 
frequency of stimulation to supra-physiological levels (intensity regime) 
was compared to stimulation at constant frequency (constant regime), 
unstimulated controls and human adult and fetal heart ventricles. b, Gene 
expression data for six groups of cardiac tissues, and adult and fetal heart 
ventricles. c, Action potential for the early-stage intensity-trained 

group. d, Ix; current-voltage (I-V) curves (mean +s.d.). e, Early-stage 


(Extended Data Figs. 3g-i, 4a—-d, Supplementary Videos 1, 2). The sur- 
rogate measurements of force from calcium recordings in early-stage 
intensity-trained tissues (Extended Data Fig. 3), k), were consistent with 
the direct force measurements. 

Cell populations were dominated by cardiomyocytes, and the 
MLC2v*t:MLC2a‘ ratio, an indicator of cardiomyocyte maturity, 
depended on the stimulation regime and developmental stage of 
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Cell line 
intensity-trained tissues from all three iPSC lines (C2A, WTC11, 
IMR90), but not the other groups, developed a positive force-frequency 
relationship after four weeks of culture. Line above graph indicates 
P<0.05 for the 2-6 Hz group versus other training regimes using 
two-way ANOVA followed by Tukey’s honest significant difference (HSD) 
test. f, Cell area over time. Line above graph indicates P < 0.05 versus other 
timepoints using two-way ANOVA followed by Tukey’s HSD test; *P < 0.05 
versus control group using one-way ANOVA followed by Tukey’s HSD test. 
Data in e and f are mean + 95% confidence interval (CI). Sample sizes are 
shown in Supplementary Information, ‘Main figure data sample sizes. 


hiPS-CMs. The increasing contractile demands induced the adult-like 
cardiac morphology that is necessary for high force generation in early- 
stage intensity-trained tissues. The cell size increased (an indicator 
of physiological hypertrophy’) and both cells and nuclei were elon- 
gated (an indicator of maturation’). The sarcomere length reached 
2.2 um, a similar value to that of adult human ventricular myocytes’. 
The contractile capacity, fraction of cells containing sarcomeres and 
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Fig. 2 | Enhanced cardiac ultrastructure, bioenergetics and T-tubule 
formation in early-stage intensity-trained tissues derived from C2A 
cells. a, Transmission electron microscopy (TEM) of tissues and cardiac 
tissue models. Scale bars, 1m. b, d, g-k, Early-stage intensity-trained 
tissues cultured for four weeks and derived from C2A cells. b, Registers 
of sarcomeres, showing A-bands, I-bands, M lines, Z lines, sarcoplasmic 
reticulum (SR) and T-tubules (TT). Scale bar, 1 zm. c, Density of 
mitochondria; shaded area represents range of values measured in adult 
human heart. Line above graph indicates P < 0.05 versus other training 
regimes using two-way ANOVA followed by Tukey’s HSD test. d, Lipid 


organization of sarcomeric «-actinin also resembled those of adult 
human myocardium (Fig. le, Extended Data Figs. 4e-k, 5). 
Ultrastructural development was dependent on the stimulation 
regime and the developmental stage of hiPS-CMs from which the tis- 
sues were derived. Only early-stage intensity-trained tissues displayed 
orderly registers of sarcomeres with I-bands, A-bands, M lines, Z 
lines, desmosomes, intercalated discs, a high density of mitochondria 


x 
Intensity (AU) Ca,1.2 RYR2 BIN1 


1m 


droplets (red asterisk). Scale bar, 1 1m. e, Oxygen consumption rate 
(OCR). Oligo, oligomycin; FCCP, carbonyl cyanide-4-(trifluoromethoxy) 
phenylhydrazone; Rtn/AA, rotenone and antimycin A. f, Extracellular 
acidification rate. 2-DG, 2-deoxyglucose. g-i, Cross-sections taken to 
evaluate T-tubules: bright field view (g; scale bar, 500 1m); T-tubules 

(h, i; green, WGA; red, cardiac troponin T (cTnT); blue, nuclei; scale bar, 
10m). j, Calcium handling ultrastructure. Scale bar, 151m. k, Regular 
spacing of calcium handling proteins shown in j. AU, arbitrary units. 
Data are mean + 95% CI; sample sizes are shown in Supplementary 


positioned adjacent to the contractile machinery and proteins organ- 
ized for increased energetics (Fig. 2a, b, Extended Data Figs. 6, 7a, b). 
Whereas the fetal heart favours glucose as the primary energy sub- 
strate)’, the increased workload in the postnatal heart results in mature 
mitochondria that are optimized for fatty acid oxidation”””!. The per cent 
area of mitochondria in early-stage intensity-trained tissues (30 + 2.9%) 
was similar to those measured in adult human myocardium”. Active 
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biogenesis” was associated with the production of phospholipids near 
the sarcomeres, a switch to oxidative metabolism and the formation of 
T-tubules (Fig. 2c-f, Extended Data Fig. 7c, d). 

Early-stage intensity-trained tissues contained robust T-tubules, both 
longitudinally and in cross-sections. T-tubules (measured using wheat 
germ agglutinin (WGA) and di-8-ANEPPS) were co-localized with 
the bridging integrator 1 (BIN1), ryanodine receptor 2 (RYR2), and 
L-type calcium channels (Cay1.2, encoded by CACNA1C) with spacing 
optimized for calcium handling (Fig. 2g-k, Extended Data Fig. 8), as 
in the adult heart”. These tissues displayed spatially uniform cell den- 
sities, presumably owing to the enhanced transport of nutrients and 
metabolites during tissue contractions, generated the highest force, and 
expressed the Ca”*-induced Ca”* release (CICR) modulators RYR2 
(control of SR/ER calcium release) and BIN1 (control of ion flux along 
T-tubules”®) (Fig. 3a, Extended Data Figs. 8a, f, g, 9a). 

The frequency-dependent acceleration of relaxation (FDAR), an 
intrinsic property of adult myocardium that was observed for early- 
stage intensity-trained tissues showed that the tissues subjected to 
supra-threshold electrical pacing regimes developed mechanisms to 
respond to the increasing workload. The presence of ultrastructural 
machinery for contraction-relaxation was confirmed by the position- 
ing of T-tubules in proximity to the cardiac calcium pump SERCA2A 
(encoded by ATP2A2) and the sodium-calcium exchanger NCX1 
(encoded by SLC8A1). Consistently, transcription of the genes respon- 
sible for clearing cytosolic calcium (ATP2A2 and SLC8A1'®) increased 
over time, and the sequestration and extrusion of calcium became faster, 
enabling the hiPS-CMs to relax and respond to contractile triggers. 
Blocking Cay1.2 with nifedipine or verapamil gradually reduced calcium 
transients in a training-dependent manner, while the response to caffeine 
indicated that only the early-stage intensity-trained tissues had func- 
tional intracellular calcium stores. Blocking SERCA with thapsigargin 
treatment to prevent SR/ER calcium uptake halted calcium transients, 


indicating that they are dependent on a functional sarcoplasmic retic- 
ulum. Subsequent addition of caffeine had no effect, consistent with 
calcium depletion of the SR/ER (Fig. 3b-d, Extended Data Fig. 9b-h). 
Post-rest potentiation confirmed the functionality of SR/ER calcium 
stores in early-stage intensity-trained tissues. None of the other tested 
tissues responded to increased calcium levels or developed calcium alter- 
nans, owing to the lack of T-tubules and inefficient coupling between 
intracellular calcium entry and release. When CICR was blocked 
with ryanodine to test RYR2 function, only early-stage intensity- 
trained tissues showed a response, probably owing to the presence of 
T-tubules, which are necessary for CICR. Notably, the positive FFR 
was blunted by ryanodine treatment, and completely reversed when 
calcium sequestration by SERCA2a was blocked with thapsigargin, 
indicating the importance of both CICR and the reuptake of calcium 
into the sarcoplasmic reticulum (Fig. 3e, Extended Data Fig. 9i-k). 
Because a functional $-adrenergic receptor system is dependent on 
both intracellular calcium reserves and the proximity of Cay1.2 channels 
and T-tubules*”*”’, comprehensive responses to 3-adrenergic agonists 
are an indicator of phenotypic maturation”®. We investigated whether 
early-stage intensity-trained tissues had an ionotropic response to iso- 
proterenol, since this effect is not seen in current in vitro cardiac tissue 
models”!°. We detected positive chronotropic, ionotropic and lusitropic 
responses to isoproterenol in early-stage intensity-trained tissues, with 
ECso (half-maximum effective concentration) values corresponding to 
those observed in clinical studies”? (Fig. 3f-h, Extended Data Fig. 10a, b). 
Tissue maturity was necessary to recapitulate critical aspects of car- 
diac hypertrophy (HCM), a leading cause of sudden cardiac death in 
athletes”. As expected, hypertrophic tissues displayed decreased beat- 
ing frequency and increased durations of intracellular calcium tran- 
sients and decay times relative to healthy controls, and were not able to 
electromechanically capture when stimulated at high frequencies. The 
onset of HCM diminished the FDAR and resulted in a negative FFR, in 


Fig. 3 | Mature calcium handling in early- 
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contrast to healthy tissues. Differences between healthy and diseased 
groups were most pronounced in the intensity-trained tissues (Fig. 3i, j, 
Extended Data Fig. 10c-f). 

A recent study”? reported the culture of large (7 x 7mm? to 
36 x 36mm?) and thin (501m) human heart tissues, grown without 
exogenous stimulation, that displayed less developed ultrastructure, no 
evidence of oxidative metabolism, slightly negative FFR, comparable 
APD and conduction velocity, and approximately fourfold higher gen- 
erated force per unit cross-sectional tissue area when compared to the 
tissues cultured here. It would be instructive to explore how the differ- 
ent tissue geometries (very thin patches versus cylindrical muscle) and 
culture protocols (no external stimulation versus intensity training)’ 
contributed to the measured differences in structural and functional 
tissue outcomes in comparison to the present study. 

In summary, we have demonstrated that adult-like human cardiac 
tissue can be grown from hiPS-CMs in fibrin hydrogel subjected to 
stretch and auxotonic contractions in just four weeks of in vitro cul- 
ture. Two methodological advances underlie the accelerated cardiac 
maturation: the formation of tissues from early-stage hiPS-CMs, which 
displayed marked plasticity immediately after the initiation of sponta- 
neous contractions; and physical conditioning with increasing intensity 
(mimicking mechanical loading during the fetal-postnatal transition). 
Under these conditions, tissues developed adult-like gene expression 
and tissue ultrastructure throughout the tissue volume, oxidative 
metabolism, FDAR, positive FFR and physiological calcium handling. 

A notable result of our study is that highly accelerated and extensive 
maturation of molecular, structural and metabolic features of cardiac 
tissue was associated with slower and less complete establishment of 
mature cardiac function. We have demonstrated that physiological cell 
density is not sufficient to achieve adult-like mechanical function; that 
FDAR and positive FFR can be established at subnormal levels of force 
generation; and that T-tubules and oxidative metabolism are required 
for physiological FFR and calcium handling. Our tissue model does 
not recapitulate the macroscopic structure of the myocardium, and the 
maturation period of four weeks may be too short to establish all the 
functional features of adult myocardium. These factors may contribute 
to the contrast between the impressive morphological maturation and 
the less complete functional maturation. It would therefore be instruc- 
tive to use this human cardiac tissue model to study the progression of 
functional maturation. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


Cardiac differentiation of human iPSCs. Human induced pluripotent stem cells 
were obtained through material transfer agreements from S. Duncan, University 
of Wisconsin (C2A line), B. Conklin, Gladstone Institute (WT11 line) and M.Y., 
Columbia University (IMR90 line) and routinely checked for mycoplasma con- 
tamination. iPSCs were expanded on growth-factor-reduced Matrigel-coated plates 
(Corning) in mTeSR1 medium (Stemcell Technologies) that was changed on a 
daily basis, and passaged at 85-95% confluence in a 1:6 split using Accutase (Life 
Technologies). For the first 24h after passaging, the culture medium was supple- 
mented with 51M Y-27632 dihydrochloride (Tocris, 1254). 

Cardiac differentiation of iPSCs was initiated in confluent monolayers by 
replacing the mTeSR1medium with RPMI + B27—insulin medium, consisting of 
RPMI-1640 (Life Technologies), 1 x B27 supplement without insulin (a source of 
omega-3 fatty acids and the thyroid hormone that promotes cardiac maturation; 
Life Technologies), 100 U penicillin (Life Technologies), 0.1 mg/ml streptomy- 
cin (Life Technologies) and 50 1g/ml ascorbic acid (Sigma, A4544). During the 
first 24h, the medium was further supplemented with activin A (50 ng/ml, R&D 
Systems) and bone morphogenetic protein 4 (BMP4, 25 ng/ml, R&D systems). 
From 24-72h, the RPMI+ B27—insulin medium was supplemented with vascular 
endothelial growth factor (VEGE, 10 ng/ml, R&D systems). Beyond 72h through 
to the end of the differentiation process (up to 12 days), RPMI + B27 medium, 
consisting of RPMI-1640, 1 x B27 supplemented with insulin (Life Technologies), 
100 U penicillin, 0.1 mg/ml streptomycin and 50\1g/ml ascorbic acid, was used and 
refreshed every two days. At day 12, the cells were characterized by flow cytom- 
etry using the cardiomyocyte-specific marker cTnT (clone 13-11, NeoMarkers). 
Differentiation typically resulted in cell populations containing 80-90% cTInTT 
cells at day 12, which were subsequently used in experiments without selection 
for cardiomyocytes. 

Human fetal cardiac tissues. Fetal hearts were purchased as surgical waste from 
Advanced Bioscience Resources (Alameda, CA), and delivered on ice within 
2.5h of surgery. Left ventricles were sectioned from the apex towards the atria 
into 7mm long x 2mm wide strips, washed three times in Hank’s Balanced Salt 
Solution (Gibco), transferred to low attachment six-well plates (Nunc) containing 
RPMI -+ B27 medium, and placed into the incubator for 1h before taking meas- 
urements. FCT strips were analysed in a similar manner to the cardiac tissues for 
contractile behaviour, force generation, gene expression, cardiac proteins, ultras- 
tructure and histomorphology, as detailed below. In addition, RNA isolated from 
32 pooled fetal hearts, from gestational weeks (GW)21-37, was obtained from 
Clontech (Mountain View, CA) for gene expression studies. 

Human adult heart tissue. Adult heart cDNA (Clontech, 637213 and 3H 
Biomedical AB, SC6214) was used for measurement of gene expression. Tissue 
samples from adult left ventricles were obtained as surgical waste through an insti- 
tutional review board at Columbia University. 

Tissue bioreactor platform. The platform was assembled from two separate 
components: the wells for tissue culture, and an array of support structures with 
integrated elastomeric pillars for tissue attachment (1 mm in diameter, 6 mm axis- 
to-axis distance). Both components were fabricated out of polycarbonate using 
a computer numerical control (CNC) milling machine with mating features for 
stability and repeatable positioning (Extended Data Fig. la-c). 

The pillars were formed by centrifugal casting of polydimethylsiloxane (PDMS, 
Dow Corning Sylgard 184) through, and extending from, the polycarbonate sup- 
port structures. The supports were first inserted into Delrin (polyoxymethylene) 
moulds fabricated by CNC machining and polydimethylsiloxane (PDMS; 10:1 
ratio of base:curing agent) was centrifugally cast at 400 relative centrifugal force 
for 5 min and cured in an oven at 60 °C for 1h. The resulting component consisted 
of three pairs of pillars to support the formation of three tissues (Extended Data 
Fig. 1d). Pillars were 1 mm in diameter, 9 mm in length, and spaced 6 mm axis- 
to-axis. 

The platform contained 12 wells for tissue culture that were patterned with 
exact 48-well-plate spacing, so that the platform corresponded to one quar- 
ter of the standard 48-well plate. Each well had a bottom portion measuring 
10mm x 4mm x 4mm where the cells in hydrogel were introduced, and a wider 
top portion measuring 10mm x 7mm x 4mm for culture medium. A glass slide 
was bonded to the bottom of the platform to enable microscopic observation. 

Electrical stimulation of the cell-hydrogel tissues was performed using carbon 
rods (Ladd Research Industries) as electrodes. The carbon rods were placed into 
slots machined on each side of the culture well, aligned in parallel and positioned 
perpendicular to the long axis of both the culture well and the tissue. The elec- 
trodes were connected to a cardiac stimulator (Grass s88x) by platinum wires (Ladd 
Research Industries). Electrical stimulation was generated by a spatially uniform, 
pulsatile electrical field (4.5 mV intensity, 2 ms in duration, monophasic square 
waveform) perpendicular to the long axis of the tissue. The parameter settings 
amplitude, duration, frequency and waveform were controlled by the Grass s88x 
cardiac stimulator. 


Culture of cardiac tissues. Differentiated hiPS-CMs were combined with sup- 
porting human dermal fibroblasts (Lonza), cultured in Dulbecco’s Modified Eagle 
Medium (DMEM) supplemented with 10% v/v fetal bovine serum, 100 U penicil- 
lin, and 0.1 mg/ml streptomycin, at a ratio of 75% hiPS-CMs and 25% fibroblasts. 
The cells were subsequently encapsulated in fibrin hydrogel by mixing 20 mg/ml 
human fibrinogen (Sigma), 100U/ml human thrombin (Sigma-Aldrich) and the 
cell suspension at a 3:1:1 ratio. The hydrogel solution (20011 containing 2 million 
cells) was dispensed into each well of the platform and allowed to polymerize at 
37 °C for 30 min, so that the tissues readily formed around the pillars. Then, 800 pl 
of RPMI + B27 medium containing 0.2 mg/ml aprotinin (Sigma-Aldrich, A3428) 
were added into each well, with an additional 30 ml of RPMI + B27 medium con- 
taining 0.2 mg/ml aprotinin (Sigma-Aldrich, A3428) added to a 100-mm Petri dish 
(Corning, 430591) containing one platform (12 tissues). Subsequently, medium 
was changed every other day: 30 ml RPMI+ B27 medium containing 0.2 mg/ml 
aprotinin (Sigma-Aldrich, A3428) for the first seven days, and then 30 ml 
RPMI-+ B27 medium (either days 7-28 or days 7-84). 

The pillars were designed to subject the tissues to mechanical loading, mim- 
icking that in native human myocardium. Hydrogel compaction caused passive 
tension in the tissues as they were stretched between the two pillars, inducing 
elongation and alignment. Synchronous contractions induced by electrical stim- 
ulation generated dynamic forces in the contracting tissues attached to the pillars 
that were forced to work against the load. 

Electrical stimulation was initiated on day seven, using one of three training 
regimes (Fig. 1a): control (no electrical stimulation, 0 Hz), constant (constant fre- 
quency of 2 Hz), and intensity training (a ramped stimulation that increased the 
frequency, from 2 Hz on day 7 to 6 Hz on day 21, by 0.33 Hz per day; tissues were 
then stimulated at 2 Hz until day 28) (Extended Data Fig. 1e). Engineered tissues 
were randomly assigned to experimental groups. Tissues were cultured for a period 
of four weeks in 16 independent experiments, using three lines of iPSCs. Samples 
sizes for the main figures are shown in Supplementary Information, ‘Main figure 
data sample sizes’ 

Tissue properties were evaluated using real-time assessment of: amplitude and 
frequency of contractions, calcium handling, force generation, excitation threshold 
and maximum capture rate. End-point assays were performed to determine cell 
and tissue morphology (histologically), ultrastructure (by transmission electron 
microscopy), gene expression (using quantitative real-time PCR with reverse 
transcription (RT-PCR)) and the presence and distribution of cardiac proteins 
(immunohistochemistry). 

Contractility analysis. Tissue contractility was measured by tracking the change in 
tissue area as a function of time. Live-cell, bright-field videos were acquired at rates 
of up to 150 frames per second using a Pike F-032b (Allied Vision Technologies) 
camera controlled with custom SPLASSH software”*. Acquired video frames were 
inverted and an automated intensity threshold was used to identify cell location in 
the video frame. First, a baseline timepoint in the video corresponding to a relaxed 
tissue state was selected. Absolute differences in cell area from the baseline frame 
were then calculated to create a time course of cell area dynamics as a function 
of time. The resulting time courses were analysed using a native MATLAB auto- 
mated peak finding algorithm to determine locations of maximum cell contraction 
indicated by the locations of local maxima in the timecourses. Beat period lengths 
were determined from the length of time between the pairs of local maxima. Beat 
frequencies were determined by inverting beat periods. Contraction amplitude 
relaxation times were measured from the length of time required for the tissue to 
relax from the peak contraction amplitude of the local maxima to the calculated 
relaxation amplitude (for example, the R90 time was the time elapsed between full 
contraction and 10% contraction). 

Calcium handling. Tissues within culture platforms were loaded with Fluo-4 
NW (50% v/v, Life Technologies) in RPMI + B27 medium containing 5 1M bleb- 
bistatin (Sigma) for 30 min at 37 °C as necessary to reduce movement artefacts. 
Videos were acquired at a rate of 150 frames per second using a Pike F-032 
camera (Allied Vision Technologies) as described in ‘Contractility analysis. 
Videos were analysed in MATLAB using a custom script that calculated the 
temporal changes in calcium fluorescence intensity. Specifically, each frame was 
normalized to a baseline background region chosen by the user to give baseline- 
corrected changes in minimum and maximum fluorescence values for each 
frame. The temporal change in fluorescence intensity was presented as a cal- 
cium transient trace from which the measurements were obtained. In brief, the 
calcium transient ‘timing’ was determined as the peak-to-peak values of two 
successive beats as defined by the peak maxima. Calcium transient ‘amplitude’ 
was determined by numerically integrating the area below the peak maxima 
relative to the baseline. Calcium transient traces were analysed during 5-mM- 
caffeine stimulation of tissues previously treated with either 1 mM verapamil 
(Sigma-Aldrich) or 1 |.M thapsigargin (Sigma-Aldrich). Caffeine responses were 
quantified by comparing this calcium transient amplitude before and after the 
addition of 5mM caffeine (Sigma-Aldrich). 
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Conduction velocity. A surrogate of conduction velocity was assessed by calcium 
propagation within the entire tissues that were pre-treated with 5 |.M blebbistatin 
(Sigma-Aldrich) to uncouple true Ca”*-dependent fluorescent motion from the 
fluorescent signals caused by motion artefacts. The conduction velocity was calcu- 
lated by selecting two sections of the tissue within the region of calcium transient 
propagation and dividing the distance between the centres of these regions by the 
difference between their peak maxima. 

Direct measurements of force. The force generation was measured directly, using 
an organ bath with high-sensitivity force transducers. Cardiac tissues and FCT 
strips were transferred to a commercial organ bath system (DMT Myograph) con- 
taining oxygenated modified Tyrode's solution (129 mM NaCl, 5mM KCl, 2mM 
CaCh, 1mM MgCh, 30mM glucose, 25 mM HEPES, pH 7.4) supplemented with 
2% B27 and maintained at a constant temperature of 37 °C without electrical stim- 
ulation. All measurements were done using LabChart software (ADInstruments). 
The tissues were allowed to equilibrate for 15 min and any spontaneous beating 
measurements were recorded. The tissues were then allowed to equilibrate for 
another 15 min under electrical stimulation (2 Hz, 5ms, 80-100 mA, rectangular 
pulses) in order to preload the tissues by manual stepwise adjustment of the tissue 
length to that of the maximal force generated, which assumes the optimal sarco- 
mere length is thereby attained. 

Twitch tension was measured by increasing the organ bath [Ca**] from 0.2 to 
2.8mmol/l. Specifically, the extracellular calcium concentration was changed by 
changing the concentration of CaCl, used in the Tyrode's solution. The tissues were 
subjected to electrical stimulation for 3 min, and an average of 10 contractions 
were measured. The stimulation was then discontinued for 10, 20 or 30s, and the 
tissues were allowed to recover for 2 min. Post-rest potentiation measurements 
were subsequently obtained by analysing the change in twitch tension from the 
first beat upon re-initiation of electrical stimulation. 

Contractility and twitch parameters were further investigated in response to 

the increasing electrical stimulation frequency within the organ bath as previously 
described”>. Twitch forces were calculated as the average of the difference between 
cyclic peak maximum and minimum force and normalized to the cross-sectional 
area (obtained by measurement of tissue at the centre after force measurements). 
The force-frequency relationship was measured by increasing the electrical stimu- 
lation frequency from 1 Hz to 6 Hz in 1-Hz increments. The tissues were subjected 
to each stimulation frequency for 30s before increasing to the next stimulation 
frequency. The force data were measured at frequencies of 1-6 Hz (in 1 Hz incre- 
ments) for all experimental groups (static, constant, early-stage and late-stage 
intensity-trained tissues, and human fetal tissue strips) and all iPSC lines. 
Continuous recordings of force and calcium as a function of frequency. 
Continuous videos were recorded at a rate of 100 frames per second with a Zyla 4.2 
sCMOS camera (Andor) to determine calcium transients and tissue displacement. 
The stimulation frequency was increased from 1 Hz to 6 Hz in 1-Hz increments 
every 205 (that is, every 2,000 frames). The calcium transients were analysed using 
custom MATLAB software as described above for measurements of calcium traces, 
and normalized to the baseline at each frequency as (F—Fo)/Fo. Tissue displacement 
was measured using the Spottracker module in ImageJ. The areas within the tissue 
were manually selected at baseline and tracked frame-to-frame to measure changes 
in the pixel displacement over time. Calcium dye loading was performed as pre- 
viously described, but without the use of blebbistatin to block contractile motion. 
This enabled measurements of both calcium transient intensity and displacement 
during calcium imaging. 
Immunofluorescent staining. For morphological analysis, tissues and FCTs were 
fixed by using gradually increasing concentrations of paraformaldehyde (1-4%, 
in 1% increments, 1h each). Whole tissues were paraffin-embedded and cut into 
5-um-thick sections. The sectioned tissues were quenched in 0.5 M NH,Cl for 
30 min, permeabilized with 0.2% Triton X-100 in PBS for 15 min and then incu- 
bated in blocking solution (1% bovine serum albumin (BSA), 2% goat serum in 
PBS) for 2h. The following primary antibodies were incubated for 2h in 1% BSA: 
anti-sarcomeric a-actinin (1:200; Abcam ab9465), anti-cardiac troponin T (cTnT, 
1:100; Thermo Scientific MS-295-P1), anti-ryanodine receptor 2 (RYR2, 1:100; 
Abcam ab2827), anti-Cay1.2 (1:200; Abcam ab58552), anti-BIN1 (1:100; Abcam 
ab137459), anti-mitochondria (1:50; Abcam ab3298) and anti-OXPHOS (1:100; 
Acris MS601-720). Actin was deteced with Alexa Fluor 350-phalloidin (Thermo 
Fisher A22281). 

Tissues were washed three times for 5 min in 0.2% Triton X-100 and incu- 
bated with the corresponding secondary antibodies for 2 h: anti-mouse IgG—Alexa 
Fluor 488 (1:400; Invitrogen A21202), anti-rabbit IgG-Alexa Fluor 568 (1:400, 
Invitrogen, 81-6114) and anti-mouse IgG—Alexa Fluor 635 (1:400, Invitrogen, 
A31574). The tissues were washed and subsequently incubated with NucBlue 
(Molecular Probes, R37606) for nucleus counterstaining. The immunostained 
tissues were visualized using a confocal microscope (Olympus Fluoview FV 1000). 

For T-tubule immunostaining, tissues were incubated with WGA-Alexa Fluor 
488 (Life Technologies, W11261) or di-8-ANEPPS (Life Technologies, D-3167) for 
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20 min before permeabilization and subsequent staining with additional antibodies 
as described above. 

Transmission electron microscopy. Tissues, FCTs, and adult heart tissue were 
fixed with 2.5% glutaraldehyde in 0.1 M Sorenson's buffer (pH 7.2) for 1h and 
sent to the Electron Microscopy and Histology (EM&H Core) Facility at Weill 
Cornell Medical College for subsequent sample preparation, imaging and data 
interpretation in a blinded fashion. Samples were post-fixed for an additional hour 
with 1% OsO, in Sorenson's buffer. After dehydration, the samples were embedded, 
sectioned, stained with toluidine blue and examined under a JEM-1400 electron 
microscope. 

Fraction of cells containing sarcomeres. Sectioned tissues were immunofloures- 
cently labelled with sarcomeric a-actinin and DAPI. Using the cell counter plugin 
in ImageJ, the DAPI-positive cells were marked and counted. Subsequently, all 
DAPI-positive cells that stained positive for «-actinin were counted, and the per- 
centage of cells containing sarcomeres of DAPI-positive cells was calculated. 
Sarcomere length. Sarcomere length was determined in dissociated cells that 
were replated as a monolayer and stained with sarcomeric a-actinin by measur- 
ing the distance between intensity peaks along the long axis of designated cell areas 
containing clear striations’. A minimum of three sarcomere lengths per cell were 
obtained in large numbers of cells from n > 6 biological replicates. 

Change in tissue area. The change in the projected tissue area (per cent change 
between the contracted and relaxed state) was experimentally determined in 
bright light by analysing the change tissues paced at 1 Hz and twice the excitation 
threshold, by custom-designed MATLAB code that used video edge-detection 
based on the contrast between the darker tissue and the lighter surrounding 
area. For each group and time point, the change in area was normalized to the 
change in area measured at day 6, immediately before the application of electrical 
stimulation. 

Cell morphology. Cells were enzymatically digested using serial digestions of 
collagenase type 1 and 2 (Worthington), and plated onto eight-well chamber slides 
(Laboratory-Tek, Sigma-Aldrich). The cells were allowed to attach for 72h and 
imaged using phase-contrast microscopy. Cell area was quantified from the images 
using the “%Area function in Image] after thresholding of the cells in each image. 
Cell elongation ratio was calculated from these images using the ‘Roundness’ func- 
tion in ImageJ, in which the aspect ratio was defined as (1 — Roundness), with 0 
corresponding to a circle and 1 corresponding to a completely elongated object*?. 
RT-PCR. Total RNA was purified from tissues according to the manufacturer's 
instructions using TRIzol (Life Technologies). For measurements of adult and 
FCT NPPA and NPPB expression, commercial tissues were used: adult heart, ages: 
30-39, pooled from three male hearts (TaKaRa/Clontech Human RNA Master 
Panel II, 636643, lot no. 1208462 A); fetal hearts, GW: 21-37, pooled from 32 male 
and female fetal hearts (Clontech Human Fetal Heart Poly-A* RNA, 636156, lot 
no. 7110214; synthesized with oligo-dT20 and SSIII kit). Reverse transcription was 
performed using Ready-To-Go You-Prime First-Strand Beads (GE Healthcare, 
27-9264-01) following the manufacturer’s instructions. Gene expression was 
quantified by real-time PCR using SYBR Green primers (Life Technologies) in 
an Applied Biosystems Step One Plus. Data analysis was carried out using the 
logs-fold change normalized to late-stage week 1 tissue gene expression shown 
in Fig. 1b, Extended Data Fig. 2a. Data analysis was carried out using the fold 
change normalized to glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene 
expression in Extended Data Fig. 2b, 9c. Primers used are listed in Supplementary 
Information, ‘Primer list’. 

Oxygen consumption rate and extracellular acidification rate. Engineered car- 
diac tissues were dissociated into single cells after four weeks of cultivation using 
activated papain solution containing 20 U/ml papain (~15 min at 37 °C with gentle 
tapping), as described for electrophysiological recordings. The enzyme reaction 
was terminated by adding 10% FBS in DMEM/F-12 culture medium. The dis- 
sociated hiPS-CMs were plated into XF96 Culture Plates (Seahorse Bioscience) 
coated with Matrigel (Corning, 354230) and cultured for three days. Subsequently, 
the plated hiPS-CMs were assayed in real-time using an XF-96 Extracellular Flux 
Analyzer (Seahorse Biosciences) for oxygen consumption rate (OCR) and extra- 
cellular acidification rate (ECAR) per the manufacturer's protocols. 
Isoproterenol response. Isoproterenol was diluted in standard medium 
(RPMI + B27) to a concentration of 11M. Tissues were placed in the organ bath 
(as previously described) and equilibrated for 10 min. Videos were captured, force 
measurements were recorded before and after addition of the drug, and the change 
in the generated force was determined. 

Patch-clamp electrophysiology. Engineered cardiac tissues were dissociated into 
single cells for whole-cell patch-clamp recordings at four weeks using activated 
papain solution containing 20 U/ml papain from Caripa papaya (Sigma-Aldrich 
76220), 1.1mM EDTA, 67 1M 2-mercaptoethanol (Sigma-Aldrich M3148), 
5.5 mM L-Cysteine-HCl (Sigma-Aldrich C7880) in 1 x EBSS (Thermo Scientific/ 
Gibco 24010-043). This optimized protocol enabled healthy patchable single 
cardiomyocytes to be obtained without spontaneous beating from early-stage 
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intensity-trained cardiac tissues. The same dissociation protocol using papain 
was also used for the other cardiac tissue samples. The tissues were incubated for 
~15 min at 37 °C with gentle tapping, and the enzyme reaction was terminated by 
adding FBS (10%) in DMEM/F-12 culture medium. 

Whole-cell patch-clamp recordings of dissociated iPSC-CMs were conducted 
using a patch-clamp amplifier (MultiClamp 700B, Molecular Devices) and an 
inverted microscope equipped with differential interface optics (Nikon, Ti-U). 
Glass pipettes were prepared from borosilicate glass pipettes (Sutter Instrument 
BF150-110-10) and a micropipette puller (Sutter Instrument, Model P-97). 

Current-clamp recording for action potential measurements was conducted in 
normal Tyrode's solution containing 140 mM NaCl, 5.4mM KCl, 1mM MgCh, 
10mM glucose, 1.8mM CaCl, and 10mM HEPES (pH 7.4 with NaOH at 25 °C) 
using the pipette solution 120mM K p-gluconate, 25 mM KCl, 4mM MgATP, 2mM 
NaGTP, 4mM Nap-phospho-creatine, 10 mM EGTA, 1mM CaCl, and 10mM 
HEPES (pH 7.4 with KC] at 25 °C). Action potentials were stimulated (5 ms, 0.3nA) 
in a current clamp mode at 37 °C (0.2 Hz), recorded and analysed using Clampfit 
10.4 (Axon Instruments). 

Voltage-clamp measurements for Ix; current recording were conducted using 

an extracellular solution containing 160 mM NMDG, 5.4mM KCl, 2mM MgCh, 
10mM glucose, 10 1M nisoldipine, 1 tM E-4031 and 10 mM HEPES (pH 7.2 with 
HCl at 25 °C) and a pipette solution 150 mM K-gluconate, 5mM EGTA, 1mM 
Mg-ATP and 10mM HEPES (pH 7.2 with KOH at 25 °C). The following pulse 
protocols were used: 2-s voltage clamp applied from —130 to+ 10mV (holding 
at —40 mV, 0.1 Hz, 2-s voltage pulse). The Ix; reversal potential (Ba”*-sensitive 
current) had a negative slope conductance consistent with inward rectification, as 
previously described*’. The current-voltage plot was analysed before and after the 
addition of 0.5mM BaCl, for 2 min. 
Dose-response curves. Drugs were diluted in standard medium (RPMI-+ B27). 
Successively higher doses of each drug were administered at concentrations of 
10-!! Mto 10-°M, in decigram increments. Videos were captured >5 min after 
each dose was administered, and processed using custom image processing soft- 
ware as described above. For chronotropic drugs, bright-field videos were taken 
at each drug concentration so that measurements of the beat frequency could be 
determined as a function of drug concentration. For ionotropic drugs, tissues were 
placed in the organ bath and force measurements were recorded as previously 
described at each drug concentration so that the measurements of the change in 
force generated could be determined as a function of drug concentration. Dose- 
response curves for these parameters could be constructed by using MATLAB to 
fit the Hill equation for sigmoid curves to the data, to determine the corresponding 
ECs value. 


Paced isoproterenol response. Cardiac tissues were loaded with calcium dye as 
described above. Tissues were transferred to standard medium (RPMI-+ B27), 
paced at 1 Hz for 30 min to equilibrate, and the baseline video recordings were then 
obtained. Successively higher doses of isoproterenol were administered directly to 
the standard medium at concentrations of 0.01, 100 and 1,000,000 nM. Videos of 
tissues were captured >10 min after each dose was administered, and processed 
using custom-designed image analysis software as described above. 

Cardiac hypertrophy model. Cardiac tissues were exposed to drugs known to 
induce pathological hypertrophy (angiotensin II, endothelin-1, isoproterenol) on 
day 6 following tissue formation. The first time point (1 week) was taken after 24h 
of incubation with or without the drug. The majority of the HCM data shown are 
from pathologically induced HCM via endothelin-1 addition (unless data from 
all three drugs are shown), since the results were comparable amongst the three 
HCM.- inducing agents. 

Statistics and reproducibility. Data are shown as mean + 95% CI. Differences 
between experimental groups were analysed by one-way or two-way ANOVA. 
Post hoc pairwise analysis was done using Tukey’s HSD test. Electrophysiological 
data were analysed by one-way ANOVA Barlette’s test with multiple comparisons. 
P values < 0.05 were considered significant for all statistical tests. The reproducibility 
of the data are demonstrated by the number of independent biological samples. 
The number of independent experiments performed for each dataset reported 
in the main figures is detailed in Supplementary Information, ‘Main figure data 
sample sizes. Details of the sample sizes used in the Extended Data are included 
in the respective figure legends. 

Reporting Summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. The study used a combination of commercial and open-source 
software packages, which are specified in the Methods, and custom-designed soft- 
ware that will be made available to interested investigators upon reasonable request. 
Data availability. Source data for quantitative data shown in all figure panels 
are available without restrictions and can be accessed at https://doi.org/10.6084/ 
m9.figshare.5765559. The detailed experimental protocol is available from Protocol 
Exchange. 
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LT) 
Extended Data Fig. 1 | Experimental design and overall appearance of 
cardiac tissues. a, Schematic of the pillars (purple) placed via interlocking 
mating components between the bioreactor wells (grey) and pillar lid 
(yellow) with tissues (pink) formed around the pillars, and electrodes 
(black) placed perpendicular to the cardiac tissues. A glass slide (blue) 

is epoxied to the bottom of the bioreactor to enable image acquisition. 

b, A schematic of the assembled bioreactor. c, Photographs of the cardiac 
tissues cultured within the bioreactor. d, The tissue pillar. e, Increase in 
the electrical stimulation frequency throughout the intensity training 
regime. f-h, Photographs of the tissues attached to pillars at the end 
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Ramp to 6 Hz 


of four-week cultivation: side view (f, g), bottom view (h). Scale bars, 
500 xm. i, j, Immunofluorescence in serial sections of the early-stage 
intensity-trained tissue. The dotted yellow and red lines in g, i, j indicate 
corresponding pillar placement within the tissue. Scale bars, 500 1m. 
k-p, Immunofluorescence in serial sections of the early-stage intensity- 
trained tissue in i. WGA, green; c-actinin, pink; nuclei, blue. Scale bars, 
i-l, 500 1m; m, 100 |1m; n, 201m; 0, p, 501m. Images were selected to 
include landmark features that facilitate localization and comparisons. 
Similar results were obtained from three independent experiments. 
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Extended Data Fig. 2 | Enhanced gene expression and conduction 

in intensity-trained cardiac tissues over time. a, Quantitative gene 
expression in FCTs and C2A iPSC cardiac tissues after two weeks of 
culture, as determined by RT-PCR; shown as fold change relative 

to late-stage tissues at the start of stimulation. b, Quantitative gene 
expression in early-stage cardiac tissues, normalized to GAPDH, from 
three different iPSC lines as determined by RT-PCR after four weeks of 
culture. n = 12 biologically independent samples per group; mean + 95% 
CI; no significant difference between the cell lines by two-way ANOVA. 
c-f, Representative conduction velocity activation maps for early-stage 


control (c), late-stage intensity-trained (d) and early-stage intensity- 
trained (e) cardiac tissues, and surrogate of conduction velocity in early- 
stage and late-stage C2A iPSC cardiac tissues after four weeks of culture, 
assessed by calcium propagation (f). Mean + s.e.m., n = 4-5 biologically 
independent samples per group. g, h, Representative immunofluorescence 
of gap junction (connexin-43 (Cx43), green) expression in early-stage 
intensity-trained iPSC cardiac tissue after four weeks of culture, at low 

(g; scale bar, 101m) and high magnification (h; scale bar, 5m). cTnT, red; 
nuclei (DAPI), blue. Similar results were obtained from four independent 
experiments. 
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Extended Data Fig. 3 | Electrophysiological characterization of 

human engineered cardiomyocytes. a, Representative traces of action 
potentials in early-stage control (n = 9), late-stage intensity-trained 

(n=9) and early-stage intensity-trained (n = 14) groups. n is the number 
of biologically independent samples obtained during two independent 
experiments. b, Representative traces of Ix; current for the early-stage 
intensity-trained group using voltage-clamp mode. c-f, Electrophysiology 
data after four weeks of culture showing the resting membrane potential (c), 
peak amplitude (d), duration of action potential (e) and upstroke velocity 
(f) obtained in two independent experiments, resulting in biologically 
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independent data from early-stage control (n= 9), late-stage intensity- 
trained (n = 9) and early-stage intensity-trained (n = 14) groups. 

**P< 0.01, *P < 0.05 using one-way ANOVA Bartlett’s test with multiple 
comparison. n.s., not significant. g-i, Representative continuous organ 
bath force recordings under electrical pacing from 1-6 Hz from three 
biologically independent early-stage intensity trained tissues (C2A cells) 
from one experiment. j, k, Representative continuous recordings from 

an early-stage intensity-trained tissue (C2A cells) under electrical pacing 
from 1-6 Hz of calcium (j) and surrogate force (k) as determined by tissue 
displacement, normalized to 1 Hz. 
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Extended Data Fig. 4 | Enhanced maturation and synchronicity of 
cardiac tissues in response to training regime as a function of time. 
a-c, Representative contraction profiles of FCT (a), early-stage (b) and 
late-stage cardiac tissues (c) over time (C2A cell line). d, Frequency 

of contractions in cardiac tissues over four weeks of culture. n = 35 
biologically independent samples over 16 independent experiments; 
mean + 95% CI, *P < 0.05 compared to control group by two-way 
ANOVA with Tukey’s HSD test. Early-stage intensity-trained tissue 
shows significant differences versus other training regimes by two- 

way ANOVA with Tukey’s HSD test. e, Characterization of cardiac cell 
population within cardiac tissues (C2A line) after four weeks of culture by 
fluorescence-activated cell sorting (FACS) analysis. f, g, Characterization 
of cells isolated from early-stage intensity-trained cardiac tissues (C2A 


a-actinin, CTNT, DAPI 


a-actinin, CTNT, DAPI 


line) by FACS analysis after four weeks of culture: cardiac cells (f; cTnT), 
and supporting fibroblast cells and endothelial cells (g; vimentin 

and von Willebrand Factor(vWF), respectively). h-j, Representative 
immunofluorescence of whole tissues showing the enhanced cardiac 
ultrastructure (a-actinin, green; cTnT, red; nuclei, blue) in early-stage 
cardiac tissues from the C2A line (h), WTC11 cell line (i), and IMR90 
cell line (j) after four weeks of culture. Scale bars, 51m; experiment 
repeated independently 14 times with similar results. k, Representative 
immunofluorescence showing the cell population in a histological section 
from early-stage cardiac tissue (C2A line) after four weeks of culture. cTnT, 
green; vimentin, red; nuclei, blue. Scale bar, 501m; experiment repeated 
independently two times with similar results. 
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Extended Data Fig. 5 | Physiological hypertrophy within cardiac tissues 
enhances contractility. a-c, Physiological hypertrophy of cardiomyocytes 
cultured in the electromechanically conditioned cardiac tissue format 
increases as a function of time and training regime beyond FCT levels, 

as shown by cell elongation ratio (a) and sarcomere length (b). n = 326 
biological replicates from 15 independent samples in one experiment. 

c, This enables the change in area while being electrically paced at 1 Hz, 
an indirect measure of fractional shortening, to similarly increase beyond 
FCT levels as a function of time and training regime. Data represent the 
ratio of the change in area for a given time point and the change in area at 
day 6. n=6 biologically independent samples per group; mean + 95% CI; 
*P < 0.05 compared to FCT group at week four by ANOVA with Tukey’s 
HSD test; line above graph indicates P < 0.05 compared to other training 
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regimes by two-way ANOVA with Tukey’s HSD test. d, The enhanced 
cardiac ultrastructure in intensity-trained early-stage cardiac tissues is 
documented by the quantification of sarcomere distribution in cardiac 
tissues. n = 12 biologically independent samples per group, mean + 95% 
CI. e, f, Representative immunofluorescence of gap junction (connexin-43 
(Cx43), white) in early-stage iPSC cardiac tissue (8-myosin heavy 

chain (8-MHC), green; cTnT, red; nuclei (DAPI), blue) (e) and cardiac 
ultrastructure in early-stage iPSC cardiac tissue (c-actinin, green; cInT, 
red; nuclei (DAPI), blue) (f) after four weeks of culture. Scale bar, 501m; 
experiment repeated independently three times with similar results. 

g, a-Actinin immunofluorescence (white) in cardiac tissues after four 
weeks of culture. Scale bar, 101m; experiment repeated independently two 
times with similar results. 
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Extended Data Fig. 6 | Enhanced ultrastructural properties of cardiac 
tissues following intensity training. a, Representative transmission 
electron microscopy images for FCTs, adult cardiac tissue, and early-stage 
cardiac tissues (C2A line) using different electromechanical conditioning 
protocols, after four weeks of culture. Scale bar, 500 nm. b, TEM images 


of intensity-trained early-stage cardiac tissues (C2A line) after four weeks 
of culture showing details of various ultrastructural elements, Scale bar, 
500 nm. Similar results to those in a and b were obtained independently 
with the following cells or treatments: FCT (n= 8), adult (n = 2), control 
(n=3), constant (n = 3), intensity-trained (n= 4). 
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Extended Data Fig. 7 | Intensity training of cardiac tissues derived 
from early-stage hiPS-CMs is required to enhance mitochondrial 
development. a, Representative immunofluorescence showing 
ultrastructural proteins WGA (green), a-actinin (red), mitochondria 
(blue) and oxidative phosphorylation (yellow) for early-stage cardiac 
tissues (C2A cell line) at different culture times during exposure to the 
intensity-training electromechanical-conditioning regime. Scale bar, 
20m. b, Representative immunofluorescence showing ultrastructural 
proteins WGA (green), a-actinin (red), mitochondria (blue) and oxidative 
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Early-stage intensity (4 wks) 
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~ 


phosphorylation (yellow) in cardiac tissues cultured with intensity 
training for four weeks from early-stage hiPS-CMs (C2A cell line), late- 
stage hiPS-CMs (C2A cell line) and GW19 FCT. Scale bar, 201m. Similar 
results to those in a and b were obtained independently from the following 
experiments: FCT (n =5), early-stage intensity-trained (n = 3), late-stage 
intensity-trained (n= 3). c, d, Representative TEM images for early-stage 
(c) and late-stage cardiac tissues (d) (C2A cell line) after two weeks of 
exposure to the intensity-training electromechanical-conditioning regime. 
Scale bar, 11m; experiment not repeated independently. 
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f 

9g 
Extended Data Fig. 8 | Formation of T-tubules in early-stage intensity- and GW19 FCT (g). Scale bar, 101m. h, Immunofluorescence of paraffin- 
trained cardiac tissues. a—e, Axial tissue cross-sections from intensity- embedded and sectioned cardiac tissues from three different iPSC cell 
trained cardiac tissues (C2A line) after four weeks of culture showing lines (C2A, WTC11, IMR90) after four weeks of intensity training showing 
T-tubules (WGA, green) and nuclei (DAPI, blue) at low magnification the formation of T-tubules (confirmed by both WGA staining and 
(a; scale bar, 100 1m), medium magnification (b and c; scale bar, 10 1m) di-8-ANEPPS staining), and striated ultrastructure (actin). Scale bar, 
and high magnification (d and e; scale bar, 541m). f, g, Axial tissue cross- 101m. Similar results to those in a-h were obtained in a minimum of four 


sections showing T-tubules (WGA, green), actin (red) and DAPI (blue) in independent experiments. 
intensity-trained cardiac tissues (C2A line) after four weeks of culture (f) 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Intensity training upregulates cardiac 
maturation in early-stage tissues through enhanced calcium handling. 
a, b, Intensity training promotes T-tubule formation in early-stage 
hiPS-CM tissues, as demonstrated by immunofluorescence of ryanodine 

2 receptor (RYR2, green), bridging integrator 1 (BIN1, blue) and T-tubule 
staining (di-8-ANEPPs, red). Scale bar, 101m. c, Expression of ATP2A2 
and SLC8A1 genes, which are responsible for maintaining proper calcium 
homeostasis, in early-stage tissues as determined by RT-PCR and 
normalized to GAPDH over four weeks of culture with the designated 
stimulation regime. Independent biological replicates per group: FCT, 

n= 8; control, n= 6; constant, n = 6; intensity-trained, n= 14; adult, n= 1. 
Mean + 95% CI, *P < 0.05 versus FCT group at week four by ANOVA 
with Tukey’s HSD test. Line over graph indicates P < 0.05 compared 

to other training regimes by two-way ANOVA with Tukey’s HSD test. 

d, Relaxation times in early-stage tissues as characterized by the full-width 
half-maximum (FWHM) values and the decay time (90% of the time 

from the maximal peak of the calcium transient). Independent biological 
replicates per group: FCT, n= 8; C2A, n= 12; WTC11, n=6; IMR90, 


n=6. Mean + 95% CI; *P < 0.05 versus FCT group by ANOVA with 
Tukey’s HSD. Line over graph indicates P < 0.05 between cell lines by two- 
way ANOVA. e, Representative calcium traces of early-stage tissues treated 
with 14M nifedipine. f, g, Representative traces of calcium release after 
stimulation with 5 mM caffeine in early-stage tissues and FCTs treated 
with 1mM verapamil (f) or 21.M thapsigargin (g). h, Representative traces 
of calcium release after stimulation with 5 mM caffeine for early-stage 
tissues and FCTs. i, Calcium spikes detected by fluorescent calcium dyes 
in early and late-stage tissues (C2A line) after four weeks of culture at 

two calcium concentrations. j, Intensity-trained early-stage but not late- 
stage tissues (C2A line) after four weeks of culture respond to ryanodine 

(1 pmol 1~!). k, The force-frequency relationship of early-stage intensity- 
trained cardiac tissues (C2A line) after four weeks of culture, treated with 
the RYR2 blocker ryanodine (11M) or the SERCA2a blocker thapsigargin 
(14M). Directly measured force data; n = 13 biologically independent 
samples for intensity group and n = 3 biologically independent samples for 
other groups. Mean + 95% CI, line over graph indicates P < 0.05 by two- 
way ANOVA. 
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Extended Data Fig. 10 | Intensity training in early-stage tissues 

enables physiologically relevant drug responses and the development 
of a pathological hypertrophy disease model. a, b, Calcium intensity 
measurements (a) and relaxation time obtained by measuring the time 
from the peak to 90% of the relaxation (R90) during electrical pacing (b) at 
1 Hz in early-stage intensity-trained tissues (C2A line) after four weeks of 
culture with increasing doses of isoproterenol. n = 20 biological replicates 
from six independent experiments. Mean + 95% CI; *P < 0.05 versus 
baseline response by ANOVA with Tukey’s HSD test. c, Cell area over four 
weeks of culture for the designated stimulation regime. n = 10 biological 
replicates from five independent experiments. Mean + 95% CI; line above 
graph indicates P< 0.05 compared to other training regimes by two-way 
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ANOVA with Tukey’s HSD test. d, Frequency of contractions in healthy 
(C2A) and hypertrophic (HCM) heart tissues over four weeks of culture. 
n= 12 independent biological samples from five independent experiments. 
Mean + 95% CI. e, Relaxation times in early-stage tissues (C2A line) and 
early-stage hypertrophy tissues (HCM) as characterized by FWHM values 
and the decay time (90% of the time from the maximal peak of the calcium 
transient). n = 20 biological replicates from four independent experiments. 
Mean + 95% CI, line above graphs indicate P < 0.05 compared to other 
training regimes by two-way ANOVA with Tukey’s HSD test. f, Early-stage 
intensity-trained hypertrophy tissues exhibit impaired FDAR, as shown for 
each stimulation frequency by individual traces of calcium peaks. 
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Distributed hepatocytes expressing telomerase 
repopulate the liver in homeostasis and injury 


Shengda Lin!3, Elisabete M. Nascimento!*%, Chandresh R. Gajera!?3, Lu Chen!*%, Patrick Neuhdéfer!?’, Alina Garbuzov!?, 


Sui Wang“ & Steven E. Artandi!?3* 


Hepatocytes are replenished gradually during homeostasis and 
robustly after liver injury!”. In adults, new hepatocytes originate 
from the existing hepatocyte pool*-’, but the cellular source of 
renewing hepatocytes remains unclear. Telomerase is expressed in 
many stem cell populations, and mutations in telomerase pathway 
genes have been linked to liver diseases”"''. Here we identify a 
subset of hepatocytes that expresses high levels of telomerase 
and show that this hepatocyte subset repopulates the liver during 
homeostasis and injury. Using lineage tracing from the telomerase 
reverse transcriptase (Tert) locus in mice, we demonstrate that 
rare hepatocytes with high telomerase expression (TERT84 
hepatocytes) are distributed throughout the liver lobule. During 
homeostasis, these cells regenerate hepatocytes in all lobular zones, 
and both self-renew and differentiate to yield expanding hepatocyte 
clones that eventually dominate the liver. In response to injury, the 
repopulating activity of TERT™*' hepatocytes is accelerated and 
their progeny cross zonal boundaries. RNA sequencing shows 
that metabolic genes are downregulated in TERT "2" hepatocytes, 
indicating that metabolic activity and repopulating activity may 
be segregated within the hepatocyte lineage. Genetic ablation of 
TERT #4 hepatocytes combined with chemical injury causes a 
marked increase in stellate cell activation and fibrosis. These results 
provide support for a ‘distributed model’ of hepatocyte renewal 
in which a subset of hepatocytes dispersed throughout the lobule 
clonally expands to maintain liver mass. 

Hepatocytes execute the metabolic activities of the liver and show 
functional heterogeneity along the axis within the lobule defined from 
the portal vein to the central vein!?, At the extreme ends of this axis, 
pericentral AXIN2* hepatocytes repopulate the liver during normal 
homeostasis!°, whereas periportal hepatocytes marked by SOX9 expres- 
sion are inactive during homeostasis but expand in response to chronic 
chemical damage"*. Observations indicating that proliferating hepat- 
ocytes are located throughout the lobule’*’* suggest that additional 
sources of repopulating hepatocytes exist. Telomerase synthesizes tel- 
omere repeats and has been linked to long-term renewal in stem cells 
and cancers’’. Germline inactivating mutations in telomerase genes 
predispose humans and mice to cirrhosis”-"', while activating muta- 
tions in the TERT promoter represent the most recurrent mutations in 
hepatocellular carcinoma'®. Given the important roles of telomerase 
in liver disease, and observations that telomerase is found in stem cell 
compartments in multiple adult tissues!°-”*, we hypothesized that tel- 
omerase may be expressed in liver cells with unique properties. 

To identify telomerase-expressing cells in vivo, we engineered a 
mouse strain expressing the inducible CreER™? recombinase from 
the endogenous Tert locus (Extended Data Fig. 1a—d). Treatment of 
TertF!2/+ knock-in mouse embryonic stem (ES) cells in culture 
with 4-hydroxy tamoxifen resulted in efficient recombination of a flu- 
orescent reporter (Extended Data Fig. le-g). To study the adult liver, 
we crossed Tert©=8!/+ mice with a Rosa26'S- mato’ reporter strain 
that enables permanent cell labelling by deletion of a transcriptional 


stop element flanked by loxP sites and concomitant expression of flu- 
orescent Tomato protein. Tert©?8!?/* Rosa26'S* Pomato’+ mice were 
injected with a near-saturating dose of tamoxifen (1 mg per 10g body 
weight; Extended Data Fig. 1i), and analysed 3 days later (Fig. 1a). We 
found that a subset of cells throughout the liver expressed Tomato and 
the hepatocyte marker HNF4A (Fig. 1b). Tomato expression was not 
detected in other liver cell types (Extended Data Fig. 1k—n). To iso- 
late these TERT"'8" hepatocytes by fluorescence-activated cell sorting 
(FACS) (see Supplementary Information), we labelled all hepatocytes 
with an adeno-associated virus expressing hepatocyte-specific GFP 
(AAV-GFP)” (Fig. 1c and Extended Data Fig. 1h). We found that all 
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Fig. 1 | Identification of a hepatocyte subpopulation with elevated 

Tert and telomerase activity. a, b, Immunofluorescence analysis of 
TertERT2/+ Rosa26'S! Tomato/+ livers treated with a single dose of tamoxifen 
and analysed 3 days later (timeline shown in a; d, day). Tomato (red), 
HNF4A (green), CK19 (white), and DAPI (blue) shown. c-g, Analysis 

of telomerase expression in FACS-sorted hepatocytes. c, Timeline. 

d, Representative FACS plot. e, TRAP assay (B, buffer only). f, Position 

of primer pairs for RI-qPCR. g, Fold-change in Tert mRNA expression 
between TERT! '8 and TERT” hepatocytes. n =3 mice, each indicated 
by a unique shape; horizontal bars show mean. Experiments repeated three 
times for b, more than five times for d, and twice for e and g. Scale bars, 
501m. 
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Fig. 2 | TERT" hepatocytes repopulate the 
liver in homeostasis and show downregulation 
of metabolic genes. a—i, Lineage tracing in 
Tert©ER!2/+ Rosa26'S!-Tomato/+ mice treated with 
single-dose tamoxifen (b-g) or oil vehicle (i) 
and analysed after indicated tracing periods 

by immunofluorescence for Tomato. Timeline 
shown in a; d, days; m, months; y, years. 

h, Quantification of Tomatot hepatocyte area 
(n=4, 5, 4, 4, 7, 5 mice for each time-point 
from left to right; horizontal bars show mean). 
j-l, Co-immunofluorescence for Tomato and 


e 
3 months 
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6 months 


GS in Tert©?P8!?+ Rosa2z6is> 1omate/+ mice traced 


1 year (oil) 


for 3 days, 6 months or 1 year. m, Quantification 


Tomato DAPI 


Tomato GS DAPI 


of TomatotGS* fraction of GS* hepatocytes 
(n=4, 5, 5 mice for each time-point; horizontal 
bars show mean). n, 0, RNA-seq results for 
FACS-purified TERT (Tomato*) or TERT! 
(Tomato) hepatocytes (n =3 mice for each 
group). n, Volcano plot for enriched genes and 
Gene Ontology (GO) terms (cut-offs: q < 0.05, 
|log2(fold difference)| > 0.8). 0, Gene set 
enrichment analysis (GSEA) for enriched gene- 
sets (number of genes shown for each gene set). 
= Red, enriched in TERT™®* cells; grey, enriched 
: in TERT"°” cells. Experiments repeated twice 

P for time-points in b, f, g, j-I. Scale bars, 100 1m. 


P=84x104 
i 


P=76x10° 
wi 


Hallmark xenobiotic metabolism 
Ras protein signal transduction 


Transmembrane receptor activity 


KEGG glycolysis gluconeogenesis 


nm Up in TERTLW Up in TERT#" GSEA analysis 
30- e GO: positive regulation of cell proliferation q a ee ee distes A 
GO: structural constituent of ribosome 0-000 146@ Hallmark mene Spindls 
© GO: electron transport chain | e000 a Mitochondrion 
25- 1 ' 0.000 119 Translation 
0.000 781 + KEGG ribosome 
eat | 0.002 168 
sg 0.004 170 Hallmark adipogenesis 
g 0.004 047 
= 154 0.005 59 Electron carrier activity 
o ' ‘ ® 0.010 119@ 
T 10 ' ' be 0.012 @164; Ramalho stemness up 
' ' 0.023 113@ Hallmark apical junction 
tl ‘ e 0.030 40 
5- 3 ' ® Soe 0.035 e58 |PID telomerase pathway 
q<0.05 1) ; me . 0.041 201@ Cell cycle GO 0007049 
o_ ' ' 0.045 75@ Biocarta MAPK pathway 


-3-2-10 12 3 


log, (fold difference) Normalized enrichment score 


Tomato™ cells were also GFP*, typically representing 3-5% of all hepat- 
ocytes from 2-month-old mice (Fig. 1d). Telomeric repeat amplifica- 
tion protocol (TRAP) showed a fivefold increase in telomerase activity 
in the TERT" population (GFP*+Tomato*) compared with the 
TERT’ population (GFP* Tomato  ) (Fig. le, Extended Data Fig. 1j 
and Supplementary Fig. 1). Quantitative reverse transcription PCR 
(RT-qPCR) showed that there was 12.9-fold more Tert mRNA in the 
TERT 8" population than in the bulk TERT! hepatocyte population 
(Fig. 1f, g). Both populations comprised a similar distribution of diploid 
and polyploid cells (Extended Data Fig. 2). These data show that Tert 
mRNA and telomerase activity are elevated in TERT" hepatocytes. 
To determine whether TERT" hepatocytes repopulate the liver 
during homeostasis, we performed lineage tracing by injecting two- 
month old Tert?#8!?/+ Rosa26'St Temato/+ mice with a single dose 
of tamoxifen (1 mg per 10 g body weight) or oil vehicle and allowed 
these animals to age for up to 1 year (Fig. 2a). TERT#'8" hepatocytes 
represented 2.8 + 0.4% of liver area 3 days after tamoxifen treatment, 
but the Tomato* progeny of these cells increased progressively dur- 
ing the tracing period to comprise 29.9 + 2.4% of liver area at 1 year 
(Fig. 2b-h). All Tomato* cells detected after 1 year were HNF4A* 
hepatocytes (Extended Data Fig. 3a—e), and Tomato* cells were not 
detected in mice treated with oil vehicle (Fig. 2i). A single tamoxifen 


injection generated a similar number of Tomatot hepatocytes as three 
injections administered at 5-week intervals over the same tracing 
period (Extended Data Fig. 3f-i), indicating that elevated Tert pro- 
moter activity is an intrinsic feature of cell identity. Co-staining of 
sections from this lineage tracing time course series for Tomato and 
the pericentral zone marker glutamine synthetase (GS)’” showed that 
TERT 8" hepatocytes were distributed throughout all lobular zones. 
The vast majority of TERT" hepatocytes were located in the peripor- 
tal and midlobular zones (3-day trace), and the progeny from these cells 
expanded markedly to replenish hepatocytes in these zones. Within the 
pericentral zone, the TERT" lineage comprised 1.8 + 0.3% of cells at 
3 days (Fig. 2j), but increased over time (8.2 + 0.5% at 6 months and 
12.7 £0.9% at 1 year; Fig. 2k-m and Extended Data Fig. 4). Analysis 
of proliferating hepatocyte position by Ki-67 immunostaining revealed 
that Ki-67+ hepatocytes were dispersed throughout all lobular zones 
in both wild-type and Tert©’#"!/*+ mice, matching the distributed pat- 
tern of TERTHsh hepatocytes (Extended Data Fig. 5a, b, h). These data 
show that rare TERT" hepatocytes drive a marked and progressive 
repopulation of the hepatocyte lineage throughout the lobule during 
normal homeostasis. 

To understand how TERT! cells differ from bulk hepatocytes, we 
performed RNA sequencing (RNA-seq) on TERT!" and TERT>” 
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Fig. 3 | TERT™'8" hepatocytes drive clonal expansion by a self-renewal 
mechanism. a-h, 3D analysis of sparse labelled hepatocyte clones in 
TertERT2/+ Rosa26"S!-Tomato/+ mice treated with low-dose tamoxifen and 
traced for 3 days, 3 months or 6 months. a, Timeline. b-d, Clone sizes 

(b, c) and clone number per volume (d) (n= 5, 3, and 4 mice, respectively; 
each mouse represented by unique dot colour in c; horizontal bars show 
mean). e-g, Co-immunofluorescence for Tomato and GS to assess clone 


hepatocytes isolated by FACS from three Tert©?8!?/+ Rosa26!St-Tomato/+ 
mice, 3 days after tamoxifen treatment. RNA-seq identified 3,172 
genes that were differentially expressed between the two populations 
(q < 0.05; Fig. 2n, Extended Data Fig. 3j). Gene Ontology analysis 
(Fig. 2n) and the Database for Annotation, Visualization and Integrated 
Discovery (DAVID, Extended Data Fig. 3j) showed that cell cycle genes 
were upregulated in the TERT" population, whereas ribosomal genes 
and mitochondrial genes were upregulated in the TERT!” population. 
Gene set enrichment analysis (GSEA) revealed increased representa- 
tion of gene sets associated with cell division and receptor tyrosine 
kinase activity in the TERTHish population (Fig. 20, red), and decreased 
representation of gene sets associated with ribosome components, 
mitochondrial proteins, the electron transport chain and hepatocyte 
metabolic activities (Fig. 20, grey). Proliferation in TERT" hepat- 
ocytes was higher than in TERT” hepatocytes (6.4 + 1.0% versus 
0.9 +0.1%) by 5-ethynyl-2’-deoxyuridine (EdU) incorporation (for 
7 days in drinking water, Extended Data Fig. 6). Together, these data 
suggest that TERT™8" hepatocytes are less invested in the metabolic 
and synthetic functions of bulk hepatocytes, and more dedicated to 
proliferation and homeostatic renewal, than TERT’ hepatocytes. 
To characterize the behaviour of single TERT" hepatocytes and 
their progeny through clonal analysis and sparse labelling, we injected 
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location in 6-month trace samples. h, Quantification of clone size and 
position relative to GS* cells. i-r, Single-molecule RNA FISH on FACS- 
purified TERT™®" derived (I-n) and TERT” derived (o-q) hepatocytes. 
r, Quantification by number of foci (n =3 mice per time point, each mouse 
indicated by unique shape). Bars and error bars are mean + s.e.m. For cells 
with more than five foci, Pimonth-3days = 0.56, Ptyear-3days = 2-7 X 107, 
Piyear-1month = 4.3 X 10-5). Experiments repeated twice. Scale bars, 50,1m. 


TertERT2/+ Rosqa26tst-Tomato/+ mice with a lower dose of tamoxifen 
(0.08 mg per 10g body weight), and performed lineage tracing for 
3 days, 3 months or 6 months (Fig. 3a). Confocal microscopy was per- 
formed on thick tissue sections and followed by three-dimensional 
reconstruction (Fig. 3b). The average clone size increased progressively 
from single cells at 3 days to 2.1+0.2 cells at 3 months and 4.2 £0.4 
cells at 6 months (Fig. 3c). Average clonal density did not change, indi- 
cating that there was no significant loss of TERT" hepatocyte clones 
over the 6-month trace (Fig. 3d). The irregular shape of these clones 
matches the anatomical organization of hepatocytes within hepatic 
cords”, Co-staining of 6-month trace samples with antibodies against 
GS revealed that the vast majority of clones resided outside the GST 
zone (Fig. 3e, h (red bars)), and a subset of these bordered the GS* peri- 
central zone (Fig. 3f, h (green bars)). We also found occasional clones 
comprised of a mixture of GSt and GS~ cells (Fig. 3g, h (blue bars)). 
The ‘cross-zone’ clones derive from TERT" hepatocytes but comprise 
cells with two distinct zonal fates. These clonal studies matched the 
findings on homeostatic expansion of the TERT" lineage (Fig. 2), 
and further supported the idea that TERT™'8" hepatocytes are a key 
source of hepatocyte renewal. 

TERT™' hepatocytes could generate clones either by a self-renewal 
and differentiation mechanism, in which daughter cells are comprised 
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Fig. 4 | TERT™'8" hepatocytes are critical for liver regeneration. 

a-f, Single-dose CCl,-induced liver injury. a, Timeline. 

b-e, Immunofluorescence for Tomato and GS of cells from 

TertER!2/+ Rosa26-s!-Tomato/+ mice treated with oil vehicle (b, c) or 

CCl, (d, e) at day 7 post-treatment. White lines show GS* pericentral 
zone. f, Quantification (n =5 mice). g-j, DDC diet-induced injury in 
TertER12/+ Rosa26tS!-Tomato/+ mice. g, Timeline. h, i, Immunofluorescence 
of liver for Tomato after 30 days treatment with normal diet (h) or 

DDC diet (i). j, Quantification (n = 4 mice). k-m, Ablation of TERT™#" 


of both TERT" and TERT" cells (Fig. 3j), or by a simple duplica- 
tion mechanism, in which all daughter cells remain TERT (Fig. 3k). 
To distinguish between these possibilities, we examined Tert mRNA 
with single-molecule RNA fluorescence in situ hybridization (FISH) 
on sorted Tomato* and Tomato hepatocytes from different trac- 
ing periods (Fig. 3i, l-r), as well as wild-type hepatocytes (Extended 
Data Fig. 7). The percentage of Tomato* cells with high Tert mRNA 
(more than five mRNA foci) was comparable at 3 days and 1 month 
(80.3 + 2.0% versus 75.3 + 4.8%), but decreased to 18.042.2% after 
1 year. Tomato~ cells remained TERT”, regardless of the tracing period. 
The presence of rare cells in this fraction with high Tert mRNA is likely 
to indicate incomplete recombination with CreER™. These studies indi- 
cate that the TERT" 8" subpopulation both self-renews to replenish 
TERT#8 cells and differentiates to yield TERT! daughter cells. 

To understand the ability of TERT'"" hepatocytes to replace dam- 
aged cells in the pericentral zone, we eliminated pericentral hepatocytes 
by single-dose carbon tetrachloride (CCl,) injection” (Extended Data 
Fig. 8c-f). Although TERT"#" hepatocytes are rare within the GS* peri- 
central zone, there was a marked increase in the number of GS*Tomato* 
cells 7 days after injury (Fig. 4a-f). These data indicate that injury to 
pericentral hepatocytes activates nearby TERT™'8" hepatocytes, and 
that their progeny assume a new zonal identity in healing pericentral 
wounds. To understand whether TERT" hepatocytes contribute to 
hepatocyte regeneration after global injury, we challenged the livers with 
a 0.1% 3,5-diethoxycarbonyl-1,4-dihydrocollidine (DDC) diet (Fig. 4g 
and Extended Data Fig. 8g, h). There was a significant expansion of 
Tomato* hepatocytes after 1 month of the DDC diet (38.0 + 3.2% ver- 
sus 5.6 + 0.3% in control livers, P =0.0018; Fig. 4g-j). Some progeny of 
TERT!" cells adopted a ductal fate (Extended Data Fig. 9), consistent 
with known hepatocyte plasticity following DDC-induced injury”®. 
These findings reveal that TERT" hepatocytes repopulate hepatocytes 
at an accelerated rate in the setting of chemical injury. 


hepatocytes via AAV-IsI-DTA injection into Tert©?#®'?/* Rosa26ls!- Tomato/+ 
mice (n= 4 mice). k, Timeline. 1], The AAV-Isl-DTA construct. 

m, Quantification of Tomato™ cells. n-v, Genetic ablation followed 

by DDC injury in Tert©?®"/* Rosa26'S'-°/+ mice. n, Timeline. 

o-w, Livers analysed by SiriusRed for collagen (o-q), SMA for activated 
stellate cells (r-t), and CK19 (u-w) (n=4 mice). 0, r, t, AAV-GFP injected 
control animals; p, s, u, AAV-Isl-DTA-injected animals. Horizontal bars 
show mean. Experiments repeated at least twice. Scale bars, 100 1m. 


To determine whether TERT™®" hepatocytes are required for nor- 
mal injury responses, we ablated Tert-expressing hepatocytes using 
a diphtheria toxin (DTA)-based AAV system, in which hepatocyte- 
specific expression of DTA is induced upon Cre-mediated deletion 
of a loxP-EGFP-Stop-loxP element (Fig. 4k-m). Intravenous infection 
of wild-type mice with both AAV-ls]-DTA and AAV-Cre resulted in 
massive hepatocyte necrosis and death within 6 days, whereas infection 
with AAV-Isl-DTA alone was well tolerated for up to 2 months and did 
not induce liver damage (Extended Data Fig. 10f-j). When we used 
this system in Tert©??®!?/+ Rosa26'S/ Temate/+ mice, the abundance of 
TERT His (Tomato*) cells was reduced by 75.1% in mice treated with 
AAV-Isl-DTA compared with those treated with AAV-GFP (Fig. 4m). 
After ablating TERT" cells, we induced liver injury with a DDC diet 
for 30 days (Fig. 4n). Expansion of the TERTHish (Tomatot) cell lin- 
eage was substantially suppressed in mice treated with AAV-Isl-DTA 
(Fig. 4s, v) compared with those treated with AAV-GFP (Fig. 41, u). 
DDC treatment following ablation of TERT" hepatocytes led to a 
marked increase in liver fibrosis, shown by an increase in collagen 
deposition (Fig. 4o-q) and a concomitant increase in the number of 
activated stellate cells (Fig. 4r-t). There was an associated increase in 
cells positive for the ductal marker CK19 (Fig. 4u—w), indicating that 
suppression of hepatocyte renewal enhances the ductal reaction char- 
acteristic of DDC treatment. Finally, we replicated these results using an 
independently constructed AAV that allows induction of DTA through 
Cre-mediated inversion and deletion steps (AAV-flex-DTA) (Extended 
Data Fig. 10). Together, these data show that TERT" hepatocytes 
are critical for normal liver regeneration in the setting of DDC injury 
and that regeneration in their absence results in elevated stellate cell 
activation and fibrosis. 

On the basis of the dispersed location of TERT" hepatocytes and 
their clonal behaviour during regeneration, we propose a ‘distributed 
model to explain hepatocyte renewal. According to this model, rare 
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TERT" hepatocytes located throughout the lobule form enlarging 
clones during homeostasis in response to hepatocyte loss, and this 
response is accelerated during liver injury (Fig. 5). These findings pro- 
vide a framework that can explain several longstanding observations 
in hepatocyte renewal, including the ability of the liver to recover from 
injuries in any lobular zone; a general lack of evidence for long-range 
migration of hepatocytes; and the presence of rare proliferating hepat- 
ocytes throughout the lobule. Our RNA-seq data suggest that repopu- 
lating activity and metabolism may be segregated within the hepatocyte 
population. Telomerase activity is critical for preserving long-term cell 
division and chromosomal stability. Maintaining the liver using a subset 
of hepatocytes with elevated telomerase and reduced metabolic activity 
may be important for long-term tissue maintenance, for preventing 
the accrual of damaged DNA caused by reactive oxygen species and 
for suppressing hepatocellular carcinoma. We speculate that depletion 
or dysfunction of an analogous subset of repopulating hepatocytes in 
humans may underlie the pathophysiology of cirrhosis. Strategies to 
mitigate this cellular depletion may prove useful in treating cirrhosis 
of diverse aetiologies. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


Generation of the Tert©?#"!” knock-in line. The targeting vector was generated 
by serial recombineering and gate-way cloning. Homology arms (mm10 chr13: 
73,621,344-73,631,102) were cloned from the BAC (RP24-342018) via recombi- 
neering. A codon-optimized intron-CreERT2-NeoR cassette” was inserted into the 
endogenous translational start site of Tert (mm10 chr13: 73,627,032-73,627,033) 
via recombineering. The final targeting vector was created via gate-way cloning to 
the pWS-TK2 vector with thymidine kinase cassettes at both ends of the homology 
arms, as previously described”*. The targeting vector was linearized and electro- 
porated into JM8/F6 mouse ES cells. Correctly targeted ES clones were selected by 
Southern blotting and karyotypes, and then injected into ICR/CD-1 blastocysts 
to generate the knock-in line. Tert©*="!/+ mice were born at normal Mendelian 
frequency. To verify the efficacy of CreERT2 in the ES cells, the Tert"=®!/* clone 
was targeted with a modified Rosa26-mTmG targeting vector”? using HygroR as 
the selection gene. The double knock-in cells were treated with 500nM 4-hydroxy 
tamoxifen (4-OHT) to evaluate recombination efficiency. 

AAV production. All AAVs used in this study were produced with cis-plasmids 
containing the full TBG promoter (two copies of the «-1-microglobulin/bikunin 
precursor (AMBP) enhancer elements followed by the promoter of the SERPINA7 
gene and a mini-intron), an AAV8 serotype packaging plasmid, and an adeno- 
virus helper plasmid. AAV-GFP (AAV8-TBG-PI-eGFP-WPRE-bGH, catalogue 
no. AV-8-PV0146) and AAV-Cre (AAV8-TBG-PI-Cre-rBG, catalogue no. 
AV-8-PV1091) were purchased from the University of Pennsylvania Vector Core. 
AAV-IsI-DTA contains a strong SV40 stop element cloned from the Lox-Stop- 
Lox TOPO plasmid”? (addgene Plasmid no. 11584). AAV-flex-DTA was modified 
from pAAV-mCherry-flex-DTA*! (addgene Plasmid no. 58536) with the following 
changes: the EF-1« promoter was swapped with the TBG promoter, and mCherry 
was swapped with EGFP. HEK293T cells were transfected and grown on Corning 
multi-layer flasks to produce the viruses. The viruses were purified by Iodixanol 
(Sigma-Aldrich) gradient ultracentrifugation®, and titred by qPCR** and SYPRO 
Ruby (ThermoFisher) protein gel staining with standards. 

Animals. Tert©#!7/+ mice were bred with the Rosa26 reporter 
(Gt(ROSA)26Sortm14(CAG-tdTomato)Hze/J) line*4 to generate 
Tert?ERT2/+ Rosa 26"! Tomato/+ mice for analysis. Two-month-old mice were intra- 
peritoneally injected with tamoxifen (Caymon, 1 mg per 10g weight) dissolved 
in 100,11 sesame oil (Sigma-Aldrich). Sparse labelling was achieved by inject- 
ing tamoxifen at 0.08 mg per 10g weight. EdU (Carbosynth) was administrated 
via drinking water (1 mg/ml) daily for seven days. AAV was diluted to 4 x 101! 
genome particles in 10011 normal saline (per mouse), and injected intravenously. 
For DDC injury, mice received diet TD.07571 (Harlan) containing 0.1% DDC 
(Sigma-Aldrich) ad libitum. For CCl, injury, mice were injected with liquid CCl, 
(Sigma-Aldrich, 10,11 per 10 g weight) dissolved in sesame oil (Sigma-Aldrich). 
Statistics. No statistical methods were used to predetermine sample sizes. When 
comparing two groups, P values were determined by two-sided unpaired t-test. 
When comparing more than two groups, P values were determined by one-way 
ANOVA with Tukey’s HSD test performed as the post hoc analysis. Data signif- 
icance was also tested by non-parametric statistics using two-sided unpaired 
Wilcoxon-Mann- Whitney test for two-group comparison, and Kruskal-Wallis 
one-way ANOVA on ranks with Conover-Iman test performed as the post hoc 
analysis for more than two groups. Kolmogorov-Smirnov test was performed to 
compare the distribution patterns of continuous variables. The animals were ran- 
domly assigned to each experimental or control group. The investigators were 
not blinded to allocation during experiments and outcome assessment. Data are 
presented as mean +s.e.m. Graphs were generated by the ggplot2 package” in R. 
FACS experiments. Cells were isolated by standard two-step collagenase perfusion. 
Liver perfusion medium (Life Technologies) and filtered (0.22 1m) liver digest 
medium (Life Technologies) were perfused via the portal vein sequentially, accord- 
ing to the manufacturer’s instructions. Dissociated liver was passed through a 
100-|1m cell strainer and the hepatocytes were enriched by low-speed centrifuga- 
tion (50g for 3 min) three times in hepatocyte wash medium (Life Technologies). 
Cells were analysed and/or sorted with a BD Aria II flow cytometer using a 100-j1m 
nozzle. Dead cells were excluded by Topro3 (1|1M) or DAPI (11M) (Life 
Technologies) incorporation. For ploidy analysis, hepatocytes were incubated in 
Hoechst 33342 (15 }1g/ml) and Reserpine (51M) at 37°C for 30 min before analysis. 
Immunofluorescence, imunohistochemistry, EdU detection, single-molecule 
RNA FISH and SiriusRed staining. Livers were cut into small blocks, and fixed 
in zinc-buffered formalin (Anatech). For immunofluorescence, tissue blocks were 
fixed overnight at 4°C, cryoprotected in 30% (w/v) sucrose, embedded in OCT, 
snap-frozen and cut into 7-|1m cryosections. For thick tissue analysis, tissue blocks 
were briefly fixed, embedded in low-melting agarose and cut into 300-|1m sections 
using a vibratome, as previously described*°. For immunohistochemistry, tissue 
blocks were fixed overnight at 4°C, incubated in 70% ethanol overnight, embedded 
in paraffin and cut into 5-\1m sections. Antigen retrieval was performed with citrate 
(pH 6) buffer (Biogenex) for 10 min using a pressure cooker. Slides were stained 
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with primary and secondary antibodies in blocking buffer (1% BSA, 5% donkey 
serum, 0.25% Triton-X in PBS) overnight at 4°C, incubated with 1mM DAPI for 
5 min at room temperature, and mounted in Aqua poly/mount (Polysciences), or 
Vectashield with DAPI (Vector laboratories). DAB Peroxidase Substrate Kit (Vector 
Laboratories) or Emerald chromogen kit (Abcam) were used for immunohisto- 
chemistry. EdU incorporation was detected by using the Click-iT EdU Alexa Fluor 
488 Imaging Kit (Life Technologies). For analysis on cytospun samples, cells were 
FACS-sorted and cytospun (500 r.p.m./28g for 5 min) onto slides. Slides were fixed 
in 4% (v/v) PFA for 5 min, and stained with primary and secondary antibodies in 
blocking buffer for 1h at room temperature, and then mounted in Prolong Gold 
with DAPI mounting medium (Life Technologies). Alternatively, slides were fixed 
in 4% (v/v) PFA for 20 min, and processed for single-molecule RNA FISH using 
an RNAscope 2.0 HD Detection-RED kit (ACDbio) according to the manufac- 
turer’s instructions. SiriusRed staining for collagen deposit was performed with 
Fast Green as the counter-stain, using a staining kit (Chondrex), according to the 
manufacturer's instructions. 

RT-qPCR and RNA-seq. RT-qPCR and RNA-seq were performed on TERT!" and 
TERT’ hepatocytes isolated by FACS from three Tert©"’=8!/+ Rosa2ois= Temato/+ 
mice three days after tamoxifen treatment. Hepatocytes were sorted directly 
in TRIzol-LS (Life Technologies). Total RNA was extracted and purified 
using an RNeasy micro kit (Qiagen) according to the manufacturer’s instruc- 
tions. qPCR was performed using the following primers: Tert (pair1)*” 
CCACGTATGTGTCCATCAGC/TAGAGGATTGCCACTGGCTC; Tert (pair2) 
ATCTGCAGGATTCAGATGCC/GCAGGAAGTGCAGGAAGAAG; Tert (pair3)”! 
TGGCTTGCTGCTGGACACTC/TGAGGCTCGTCTTAATTGAGGTCTG; Gif2b 
CTCTGTGGCGGCAGCAGCTATTT/CGAGGGTAGATCAGTCTGTAGGA. 
qPCR reactions were carried out using Brilliant II SYBR Green master mix 
(Strategene) and Roche lightcycler 480. Quantitation cycle (Cq) values were 
determined by the second derivative maximum method, and fold-changes were 
calculated by 2~4°4, RNA-seq libraries were constructed using a KAPA Stranded 
mRNA-Seq Kit (Kapa). Libraries were sequenced on the Illumina NextSeq plat- 
form, generating about 55-75 million 75-bp paired-end reads per library. Three 
biological replicates per sample were analysed. Raw reads were trimmed by 
TrimGalore 0.4.0 (Babraham Bioinformatics), mapped to mm10 by tophat 2.0.13°%, 
analysed by the DEseq2 packages”. 

Imaging analysis. Fluorescent images were analysed using Leica LAS AF, ImageJ, 
Adobe Photoshop and Fluorender. Area index was defined by the liver area cov- 
ered by Tomato? cells as the percentage of total area, and quantified by ImageJ. 3D 
reconstruction was performed using Fluorender. Multicellular clones were imaged 
in 258 x 258 x 100-1m? volumes using a Leica SP8 confocal microscope or a Prairie 
Ultima IV two-photon microscope. Clones composed of more than eight cells 
often extended the imaging volume, and therefore were counted as eight cells. 
The surface planar view was created by maximum-projection of the first 12-j»m 
volume close to the surface to approximate staining results from thin sections. For 
co-immunostaining with GS, 580 x 580 x 100-j1m? volumes were imaged. Stitched 
single-plane images were processed from individual tiles by Adobe Photoshop. 
The number of EdU*, Ki67+, GSt, CK19* hepatocytes was manually counted. 
TRAP assays. Telomeric repeat amplification protocol (TRAP) was carried out 
by a previously established protocol*®. FACS-sorted cells or homogenized tissue 
were lysed in NP40 buffer (25 mM HEPES-KOH, 400 mM NaCl, 1.5mM MgCh, 
10% glycerol, 0.5% NP40, and 1mM DTT (pH 7.5) supplemented with protease 
inhibitors). 

Ethical compliance. All animal protocols were approved by the Institutional 
Animal Care and Use Committee at Stanford University. All experiments complied 
with the relevant ethical regulations of Stanford University. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. Codes are available from the corresponding author upon 
request. 

Data availability. The source data for the RNA-seq study are available in the NCBI 
Gene Expression Omnibus (GEO) repository under accession number GSE104415. 
Source Data for Figs. 1-5 and Extended Data Fig. 1-7, 9, 10 are available with the 
online version of the paper. 
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Extended Data Fig. 1 | Generation and characterization of the livers before and after FACS enrichment. i, Tamoxifen dose-response 
Tert©ER12/+ knock-in line. a, Tert©"*” targeting strategy and Southern curve for Tert©?#8!2/+ Rosa26!S!-Tomato/+ livers (n =3 mice for each group; 
blot strategy. b-c, Southern-blots using a 5’ probe (b), a NeoR probe (c), horizontal bar shows mean). j, Quantification of the TRAP assay shown in 
and a 3’ probe (d). KI, knock-in cells; WT, wild-type cells. For gel source Fig. le by densitometry. k-n, Co-immunofluorescence for Tomato (red) 
data, see Supplementary Fig. 1. e-g, TerteER!2+ Rosa26™!™G/+ mouse ES and CD45 (k, blood cells, 202 cells examined), CD68 (1, Kupffer cells, 179 
cells, which respond to Cre-mediated recombination by switching from cells examined), GFAP (m, stellate cells, 158 cells examined) and PECAM 
membrane Tomato to membrane EGFP expression (e), showed either (n, endothelial cells, 167 cells examined) in Tert@?#8!?/+ Rosa26iS! Tomato/+ 
membrane Tomato (f, overlaid on bright-field image) or membrane EGFP _livers after 3-day trace with DAPI (blue) staining. Experiments repeated 
(g, overlaid on bright-field image) in response to 500 nM 4-hydroxy twice for b-d, f, g, k-n. Scale bars, 100|1m in g, h, 501m in k-n. 


tamoxifen (4-HT). h, Hepatocytes from Tert©=8!?/+ Rosa26iS-- Tomato/+ 
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Extended Data Fig. 2 | Ploidy and nuclear profiles of the TERT!” and d-f, Nucleus count by Tomato (red), phalloidin (green) and DAPI (blue) 


TERT" lineages. ac, Ploidy analysis by Hoechst incorporation and in livers traced for 3 days (d) and 6 months (e). f, Quantification showed 
FACS in TERT!” (a) and TERT" (b) hepatocytes. c, Quantification no significant difference between TERT’ and TERT™" cells in binucleus 
showed no significant difference between TERT” and TERT" cells fractions (n = 4 mice for each group, each represented by unique dot 
regarding ploidy (n =5 mice, each represented by unique dot shapes). shapes). Experiments repeated twice. Scale bar, 501m. 
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Extended Data Fig. 3 | Characterization of the lineage expansion 

of TERT" hepatocytes. a—e, Immunofluorescence performed on 
TertERT2/+ Rosa26'S!-Tomato/+ livers after one-year trace showed that only 
TERT" hepatocytes gave rise to hepatocytes. f-i, Repeated injections 
(f) showed that TERT™" cells formed a constant proportion of the liver. 
Lineage expansion over one injection (g) and three injections (h) was 


quantified (i, n =3 mice for each group; horizontal bars show mean). 

j, Heat map showing differentially regulated genes among all TERT!” 
and TERT samples. Class 1 and class 2 refer to genes significantly 
downregulated and upregulated in TERT"'8" samples, respectively. Genes 
assigned to DAVID-generated annotation clusters shown on the right. 
Experiments repeated twice. Scale bar, 200 pm. 
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a-d, Stitched images of immunofluorescence for Tomato protein (red) tamoxifen and traced for three days were stained for CPS1 (red) and GS 
and GS (green) in liver sections from Tert©?#®!?/* Rosa26ts!-Tomato/+ (green) in TERT" (e) and TERT™®" hepatocytes (f), and quantified for 
mice treated with tamoxifen and traced for three days (a), three months the GSt fraction of all cells (g, n =3 mice; horizontal bars show mean). 
(b), six months (c) or one year (d). e-g, FACS-isolated and cytospun Experiments repeated three times. Scale bars, 200 1m. 
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Extended Data Fig. 5 | Distribution of proliferating hepatocytes 

in Tertt!+ and Tert@'=®!2/+ livers in homeostasis and after injury. 

a-f, Livers were stained with anti-Ki-67 antibody by standard 
immunohistochemistry. a-d, Ki-67* nuclei are indicated by brown colours 
in uninjured livers (a, b), and CCl, (1011 per 10 g weight) injured livers 

(c, d), with haematoxylin counterstain in light blue. e, f, Green chromogen 
was used to indicate Ki-67* nuclei in DDC (0.1%) treated livers. 
Hepatocyte nuclei were distinguished by size and morphology. Examples 
of Ki-67* hepatocyte nuclei are shown in insets. g, Quantification of Ki-67* 
hepatocytes and their distribution along the central—portal axis. The 
position index (P.I.) was determined by the distance to the most adjacent 
central vein (CV) (x), the distance to the most adjacent portal vein (PV) (y), 
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and the distance between the central and portal veins (z), following the law 
of cosines. h-j, Two-sided Kolmogorov-Smirnov tests were performed to 
analyse the distributions of Ki-67+ hepatocytes along the central-portal 
axis. Histograms (bin-width = 0.1) and shaded curves of the kernel density 
estimation with Gaussian approximation are shown (mean + s.e.m.). No 
significant differences were found between Tertt/* and Tert©?#"'”/* livers 
in uninjured livers (h, n =4 mice for each group; each mouse represented 
by unique dot shapes; P = 0.58), in CCl,-injured livers (i, n =3 mice for 
each group; each mouse represented by unique dot shapes, P= 0.32), or in 
DDC injured livers (j, n =3 mice for each group; each mouse represented 
by unique dot shapes; P= 0.98). Experiments repeated twice. Scale bar, 
200m. 
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Extended Data Figure 6 | EdU incorporation assays. a, Scheme of 
experiments. b-g, EdU incorporation in livers of Tert©”#R!2/+ 

Rosa26'S! Tomato/+ mice treated with tamoxifen, traced for three days, 

then treated with EdU in drinking water for 7 days (1 mg ml’); overlay 
image (b), HNF4A (c), DAPI (d), EdU (e) and Tomato (f). Dashed boxes, 
EdU*HNF4A* Tomato? cells. g, Quantification of EdU incorporation 
into hepatocytes (n =5 mice, each represented by unique dot colours). 
h-k, EdU incorporation into livers of Tert*/* (h) and Tert©?#*!*/+ (i) mice 
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were compared. Co-immunofluorescence for GS (red) and CK19 (white) 
was overlaid with EdU (green) and DAPI (blue). j, Quantification of the 
distribution of EdU* hepatocytes (pericentral, in GS* zones; periportal, 
0-2 cell layers adjacent to the portal vein space or CK19* bile ducts; 
mid-lobular, neither pericentral nor periportal). Dot colours represent 
individual mice. k, Total EdU* hepatocytes in Tert*/* and Tert©=R?2/+ 
livers (n=5 mice for Tert*’* livers; n= 4 mice for Tert(*"®!?’* livers). 
Experiments repeated twice. Scale bars, 501m in d, 200m in i. 
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Extended Data Figure 7 | Single-molecule RNA FISH on wild-type 
hepatocytes. a, Experiment performed on wild-type hepatocytes 

isolated by FACS and cytospun. Red foci show individual Tert mRNA 
molecules. Control experiment by omitting the detection probe for Tert. 
c, Quantification by focus counts (n =3 mice, each represented by unique 
dot shapes; mean + s.e.m.). Experiments repeated three times. Scale bar, 
501m. 
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Extended Data Figure 8 | Responses of Tert*/+ and Tert©’?®?+ livers lines encircle the damaged pericentral area. e, f, H&E staining of livers 
to injuries. a, b, Haematoxylin and eosin (H&E) staining of uninjured 7 days after CCl, injection. g, h, H&E staining of livers 1 month after DDC 
livers. c-d, H&E staining of livers 3 days after CCl, injection. White dotted treatment. Experiments repeated five times. Scale bar, 200 1m. 
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Extended Data Figure 9 | Progeny of TERT" hepatocytes can adopt 
ductal fate after DDC injury. a-d, Immunofluorescence analysis of 
TerteERT2/+ Rosa26!S!-Tomato/+ livers treated with tamoxifen and DDC, 

and traced for 1 month (a, overlay image; b, Tomato; c, CK19. d, DAPI). 

e, Quantification of the percentage of CK19*Tomato* cells among all 
Tomato* cells (1 =5 mice, mean+s.e.m. 10.0 + 1.2%) f, Quantification of 
the percentage of CK19*Tomato* cells among all CK19* cells (1 =5 mice, 
mean +s.e.m. 6.1 + 1.0%). Bars show mean. Experiments repeated three 
times. Scale bar, 50m. 
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Metabolic enzyme PFKFB4 activates transcriptional 
coactivator SRC-3 to drive breast cancer 


Subhamoy Dasgupta!*, Kimal Rajapakshe!, Bokai Zhu!, Bryan C. Nikolai!, Ping Yi!, Nagireddy Putluri!, Jong Min Choil, 
Sung Y. Jung’, Cristian Coarfa!, Thomas F. Westbrook, Xiang H.-F. Zhang", Charles E. Foulds!*, Sophia Y. Tsai!, 


Ming-Jer Tsai! & Bert W. O’ Malley'* 


Alterations in both cell metabolism and transcriptional programs 
are hallmarks of cancer that sustain rapid proliferation and 
metastasis!. However, the mechanisms that control the interaction 
between metabolic reprogramming and transcriptional 
regulation remain unclear. Here we show that the metabolic 
enzyme 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase 4 
(PFKFB4) regulates transcriptional reprogramming by activating 
the oncogenic steroid receptor coactivator-3 (SRC-3). We used a 
kinome-wide RNA interference-based screening method to identify 
potential kinases that modulate the intrinsic SRC-3 transcriptional 
response. PFKFB4, a regulatory enzyme that synthesizes a potent 
stimulator of glycolysis’, is found to be a robust stimulator of SRC-3 
that coregulates oestrogen receptor. PFKFB4 phosphorylates SRC-3 
at serine 857 and enhances its transcriptional activity, whereas either 
suppression of PFKFB4 or ectopic expression of a phosphorylation- 
deficient Ser857Ala mutant SRC-3 abolishes the SRC-3-mediated 
transcriptional output. Functionally, PFKFB4-driven SRC-3 
activation drives glucose flux towards the pentose phosphate 
pathway and enables purine synthesis by transcriptionally 
upregulating the expression of the enzyme transketolase. In 
addition, the two enzymes adenosine monophosphate deaminase-1 
(AMPD1) and xanthine dehydrogenase (XDH), which are involved 
in purine metabolism, were identified as SRC-3 targets that may or 
may not be directly involved in purine synthesis. Mechanistically, 
phosphorylation of SRC-3 at Ser857 increases its interaction with the 
transcription factor ATF4 by stabilizing the recruitment of SRC-3 
and ATF4 to target gene promoters. Ablation of SRC-3 or PFKFB4 
suppresses breast tumour growth in mice and prevents metastasis 
to the lung from an orthotopic setting, as does Ser857 Ala-mutant 
SRC-3. PFKFB4 and phosphorylated SRC-3 levels are increased 
and correlate in oestrogen receptor-positive tumours, whereas, in 
patients with the basal subtype, PFKFB4 and SRC-3 drive a common 
protein signature that correlates with the poor survival of patients 
with breast cancer. These findings suggest that the Warburg pathway 
enzyme PFKFB4 acts as a molecular fulcrum that couples sugar 
metabolism to transcriptional activation by stimulating SRC-3 to 
promote aggressive metastatic tumours. 

Among the landscape of genetic alterations that drive aggressive 
metastatic tumours, the transcriptional coregulator SRC-3 is one of 
the abundantly deregulated oncogenes*°. Importantly, dynamic inter- 
actions between SRC-3 and its subsequent recruitment to target genes 
are delicately regulated by post-translational modifications on SRC-3°, 
Phosphorylation of SRC-3 can alter its transcriptional activity, protein 
stability and subcellular localization’~°, and deregulated kinase signal- 
ling hyperactivating SRC-3 is a hallmark of many tumours’®!". As a 
starting point for identifying kinases that modulate SRC-3 transcrip- 
tional activity, we performed an unbiased RNA interference (RNAi) 
screening assay using a kinome library containing short interfering 
RNAs (siRNAs) that target 636 human kinases (median 3 siRNAs 


per kinase) in the presence of aGAL4-DNA binding domain-fused 
SRC-3 (pBIND-SRC-3)!* and GAL4 DNA-binding sites containing 
the luciferase reporter gene (pG5-luc) (Fig. 1a). The concentration of 
the pBIND-SRC-3 construct needed to obtain luciferase readings in a 
linear range was standardized along with the dose of kinase siRNAs to 
observe significant alterations in SRC-3 intrinsic activity (Extended 
Data Fig. la, b). As a positive control, we used siRNAs that target 
PRKCZ1, a protein kinase known to activate SRC-313, and compared 
the repression of the coregulator activity after kinase knockdown 
with non-targeting control green fluorescent protein (GFP) siRNAs 
(Extended Data Fig. 1c). Kinome-wide screening identified several 
kinases as modulators of SRC-3 activity (Fig. 1b, Extended Data Fig. 1d 
and Supplementary Table 1), as either stimulators or repressors com- 
pared to the controls (Extended Data Fig. le). 

Ten kinases were designated as reproducible and significant hits in 
the screen (Fig. 1c and Extended Data Fig. 1f), among which metabolic 
kinase PFKFB4 was identified as the most robust positive regulator 
of SRC-3 activity. A secondary screen coupled with growth assays to 
identify the top-hit kinases that drive cancer cell proliferation also iden- 
tified PFKFB4 as the most dominant kinase that regulates cellular pro- 
liferation (Extended Data Fig. 1g). Silencing of PFKFB4 with different 
short hairpin RNAs (shRNAs) and siRNAs decreased SRC-3 activity 
(Extended Data Fig. 2a, b) in several cancer lines with reduced PFKFB4 
levels (Extended Data Fig. 2c, d), whereas ectopic overexpression of 
PFKFB4 using adenoviral infection (ad-PFKFB4) enhanced SRC-3 
activity (Fig. 1d). Interestingly, SRC-3 protein levels were increased 
after ectopic PFKFB4 expression (Fig. le), but SRC-3 (also known as 
NCOA3) mRNA levels were not affected (Extended Data Fig. 2e), and 
proximity ligation assays support a direct interaction between SRC-3 
and PFKFB4, consistent with PFKFB4-dependent regulation of SRC-3 
activity (Extended Data Fig. 2f). 

PFKFB4 is a bifunctional metabolic enzyme that synthesizes fructose 
2,6-bisphosphate (F2,6-BP), an important sugar-phosphate metabolite 
that stimulates glycolysis'*. PFKFB4 dovetails two antagonistic proper- 
ties involving a kinase reaction synthesizing F2,6-BP from fructose-6- 
phosphate (F6P) and ATP, and conversely hydrolysing F2,6-BP into 
F6P and inorganic phosphate (P;) via its phosphatase activity'"°. These 
properties of PFKFB4 prompted us to investigate whether PFKFB4- 
catalysed enzymatic reactions could increase phosphorylation on 
SRC-3. An in vitro enzymatic reaction containing F6P, ATP and var- 
ying concentrations of recombinant PFKFB4 enzyme were incubated 
with purified full-length SRC-3 protein. Increasing the amount of 
PFKFB4 enzyme in the reaction concomitantly enhanced the Ser/Thr 
phosphorylation of SRC-3, indicating that the metabolic enzyme 
PFKFB4 can phosphorylate a protein substrate (Extended Data Fig. 3a). 
We investigated the phosphate donor in the PFKFB4 kinase reaction, 
and identified ATP as being required for SRC-3 phosphorylation by 
PFKFB4 (Extended Data Fig. 3b). These findings suggest that PFKFB4 
can function as a protein kinase to phosphorylate SRC-3 by transferring 
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Fig. 1 | PFKFB4 is an essential activator of 
transcriptional coregulator SRC-3. a, Schematics 


showing the RNAi kinome library screening 
with SRC-3 transcriptional activity assay using 
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a phosphate group from ATP. To confirm this observation, we per- 
formed a kinase assay using [~-*’P] ATP as the phosphate donor and 
observed enhanced incorporation of phosphate from [7-**P]ATP into 
SRC-3 protein upon increasing concentrations of the PFKFB4 kinase 
in the reaction (Fig. 2a). To identify the phosphorylation site(s) on 
SRC-3, we used recombinant glutathione S-transferase (GST)-fused 
SRC-3 fragments encoding various domains (Extended Data Fig. 3c) as 
substrates for an in vitro kinase reaction, and found that only the CBP- 
interacting domain (CID) of SRC-3!” is phosphorylated by PFKFB4 
(Fig. 2b). In vitro phosphorylated GST-SRC-3-CID protein was then 
analysed by mass spectrometry, and only one serine residue (Ser857) 
was identified as a phosphorylation target of PFKFB4 (Extended Data 
Fig. 3d). Consistent with this identification, mutation of Ser857 to 
alanine (Ser857Ala) abolished the phosphorylation of SRC-3-CID by 
PFKFB4 in vitro (Extended Data Fig. 3e), confirming that PFKFB4 
phosphorylates oncogenic coregulator SRC-3 at Ser857. 

Because increased glucose metabolism stimulates the kinase activ- 
ity of PFKFB4 required to maintain steady glycolysis!*, we measured 
the levels of phosphorylated SRC-3 (pSRC-3) under these conditions. 
HEK293T cells were transfected with Flag-tagged SRC-3 and PFKFB4, 
and then stimulated with an increasing concentration of glucose in 
culture medium, which revealed enhanced phosphorylation of SRC-3 
(Fig. 2c). Next we investigated the levels of pSRC-3-S857 in breast 
cancer cells under conditions of active glycolysis by immunoblotting 
with a pSRC-3-Ser857-antibody. MDA-MB-231 cells growing under 
a normal glucose condition (25mM) showed robust phosphorylation 
of SRC-3 at Ser857 compared to tumour cells cultured in low glucose 
conditions (5 mM) (Fig. 2d). Withdrawing glucose from the medium 
after growth in normal glucose conditions (25 mM) resulted in signif- 
icant loss of SRC-3 phosphorylation (Fig. 2d). Moreover, stable knock- 
down of PFKFB4 using two different shRNA constructs (shPFK#09 
and shPFK#20) (Fig. 2d and Extended Data Fig. 3f) abolished pSRC- 
3-Ser857 levels in breast cancer cells cultured in 25 mM glucose, indi- 
cating that PFKFB4-dependent SRC-3 phosphorylation on Ser857 is 
a highly selective modification under conditions conducive to active 
glycolysis. We expressed the phosphorylation-defective mutant SRC- 
3(Ser857Ala) or wild-type SRC-3 protein in SRC-3-ablated cells, and 
under conditions of active glycolysis the levels of pSRC-3-Ser857 are 
increased in wild-type SRC-3 cells compared to the SRC-3(Ser857Ala) 
mutant cells (Extended Data Fig. 3g). Importantly, the introduction 
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n represents biologically independent samples. 


of fructose-1,6-bisphosphate (FBP) alone into glucose-starved cells 
permeabilized with streptolysin O rescued pSRC-3-Ser857 levels 
(Extended Data Fig. 3h), indicating that this phosphorylation event is 
linked to the energy status of the cell!”. 

To measure the importance of this modification on the intrin- 
sic activity of SRC-3, we transduced cancer cells with adenovirus 
expressing PFKFB4 (ad-PFKFB4) or control GFP, and cultured the 
transduced cells in the presence of normal glucose (25mM) or low 
glucose (5mM) levels. Enhanced expression of PFKFB4 along with 
glucose stimulation significantly increased the transcriptional activity 
of SRC-3 (pBIND-SRC-3) compared to cells cultured in low glucose 
conditions, suggesting that PFKFB4-dependent SRC-3 phosphoryla- 
tion is important for the coactivator-driven transcriptional response 
(Extended Data Fig. 3i). To substantiate this observation, we used the 
phosphorylation-deficient pBIND-SRC-3(Ser857Ala) mutant or the 
phosphorylation-mimic pBIND-SRC-3(Ser857Glu) mutant in a similar 
transcriptional activation assay and found that the Ser857Ala mutant 
was significantly refractory to glucose-dependent PFKFB4 signalling 
(Fig. 2e). The Ser857Glu mutant was constitutively active even at low 
levels of glucose, and glucose stimulation failed to show any further 
activation (Extended Data Fig. 4a). Previous studies have identified 
several crucial sites in the kinase domain of PFKFB4 that are important 
for ATP binding’. When mutated to alanine, residues Gly46, Pro48, 
Gly51, Arg229 and Arg237 significantly decrease the binding affinity 
for ATP and result in reduced PFKFB4 kinase activity. We expressed 
these mutants in PFKFB4-silenced breast cancer cells and transcrip- 
tional assays confirmed significantly reduced SRC-3 activity and 
Ser857 phosphorylation (Extended Data Fig. 4b, c). Because SRC-3 is 
an established oestrogen receptor (ER) coactivator, we investigated the 
importance of glucose-dependent PFKFB4 signalling on ER-mediated 
transcriptional activity. MCF-7 cells stably expressing an oestradiol 
(E2)-ER-dependent luciferase reporter gene (ERE-MAR-Luc cells)” 
were used to assay ER activity as a function of E2 and glucose in the 
medium. Glucose addition enhanced ER activity, whereas low glucose 
or SRC-3 silencing significantly repressed transcriptional output in 
response to E2 (Extended Data Fig. 4d). Overexpression of PFKFB4 
enhanced ER activity only in cells treated with E2 and glucose, whereas 
this PFKFB4-dependent increase in ER activity is repressed upon 
SRC-3 ablation (Extended Data Fig. 4e). Consistent with this obser- 
vation, the SRC-3(Ser857Ala) mutant failed to rescue the growth of 
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Fig. 2 | PFKFB4 phosphorylates SRC-3 by functioning as a protein 
kinase. a, Top, recombinant GST-fused PFKFB4 incubated with full-length 
SRC-3 in the presence of [**P]ATP in an in vitro kinase assay. Bottom, 
SRC-3 and PFKFB4 protein levels were analysed by immunoblotting. 

b, In vitro kinase assay of PFKFB4 in the presence of SRC-3 fragments 
expressing different domains or full-length (FL) SRC-3. c, HEK293T 
cells expressing Flag-tagged SRC-3 and PFKFB4 cultured in different 
concentrations of glucose and immunoprecipitated by Flag or pSer/Thr 
antibodies followed by immunoblotting. d, MDA-MB-231 cells stably 
expressing shRNAs targeting PFKFB4 (shPFK#09 and shPFK#20) or 
control non-targeting (NT) shRNA grown in the presence of 5mM or 
25mM glucose, or after glucose withdrawl (WD), in which cells were 


SRC-3-depleted cells compared to wild-type SRC-3 (Extended Data 
Fig. 4f). These findings suggest that in glycolytic breast tumours, 
PFKFB4 and SRC-3 can also hyperactivate ER activity in the pres- 
ence of E2, and phosphorylation of SRC-3 at Ser857 is a critical mark 
required for transcriptional responses. 

PFKFB4 is an important regulator of glucose metabolism and 
directs metabolic pathways required for biosynthesis of macromol- 
ecules to sustain rapid proliferation in cancer cells. To identify the 
physiological role of PFKFB4-dependent SRC-3 activation in tumour 
metabolism, we performed an unbiased phenotypic screen to identify 
the metabolites that are preferentially used by SRC-3-overexpressing 
cells. For this we used a phenotype microarray analysis”’ containing 
93 metabolites (Supplementary Table 2) arrayed in a microplate and 
measured in real-time the importance of these metabolites in support- 
ing SRC-3-dependent growth. We transduced mammary epithelial 
MCF10A cells (with relatively low endogenous SRC-3) with adeno- 
virus expressing GFP or SRC-3 followed by the phenotype screen for 
24h. We identified enhanced proliferation of cells with gain-in SRC-3 
expression under conditions of glucose and purines such as adeno- 
sine and inosine (Extended Data Fig. 5a—d). To investigate the role 
of SRC-3 further and determine how activation by PFKFB4 affects 
its regulation of metabolism in breast cancer cells, we performed 
mass spectrometry-based metabolic profiling of MDA-MB-231 cells 
expressing shRNAs that target SRC-3 or PFKFB4. Ablation of either 
SRC-3 or PFKFB4 significantly reduced the intracellular pools of 
ribose-5P (R5P), and purine nucleotides and intermediates, such 
as adenosine, xanthine and guanine (Fig. 3a and Extended Data 
Fig. 5e, f). Overexpression of SRC-3 in MCF10A cells also confirmed 
increased pools of purines (Extended Data Fig. 5g). To measure the 


cultured in 25 mM glucose for 24h and then switched to 5mM glucose for 
6h. Protein levels of pSRC-3-Ser857, PFKFB4 and (-actin were detected 
by immunoblotting. e, HEK293T cells expressing pBIND, pBIND- 

SRC-3 or pBIND-SRC-3(Ser857Ala) were transduced with adenoviruses 
expressing GFP or PFKFB4, and cultured in 5mM or 25 mM glucose 
followed by luciferase assay. Boxes represent the twenty-fifth and seventy- 
fifth percentiles, lines represent median, whiskers showing minimum and 
maximum points, and plus symbol indicates the mean. n = 6 biologically 
independent experiments. ****P < 0.000001, two-way ANOVA with 
Tukey’s multiple comparisons test. Data in a—e are representative of three 
biologically independent experiments with similar results. See Source Data 
for exact P values. 


direct contribution of PFKFB4 and SRC-3 regulation of glucose flux 
towards the pentose phosphate pathway (PPP), we used isotope- 
labelled [6-!3C] glucose to trace the carbon flow”?. PFKFB4 and 
SRC-3 depletion significantly reduced the '*C-enrichment of 
ribulose-5P/xylulose-5P, important intermediary metabolites in the 
PPP and rate-limiting precursors for purine biosynthesis (Extended 
Data Fig. 6a). We investigated whether exogenous addition of purines 
could rescue the reduced growth rate of SRC-3-deficient breast 
cancer cells. As expected, loss of SRC-3 suppressed the growth of 
MDA-MB-231 and MCF-7 breast cancer cells*?, whereas supple- 
mentation of purines in the culture medium significantly rescued 
the growth defect, indicating that SRC-3 expression is crucial for the 
synthesis of purines for growth (Fig. 3b). 

To identify the potential mechanisms of apparent SRC-3-driven 
purine synthesis, we performed gene expression analysis of enzymes 
involved in the PPP and purine synthesis. Knockdown of SRC-3 
reduced the mRNA expression of transketolase (TKT), adenosine 
monophosphate deaminase 1 (AMPD1), and xanthine dehydrogenase 
(XDH) (Extended Data Fig. 6b, c). These SRC-3 target genes were also 
found to be regulated by PFKFB4 knockdown (Fig. 3c) and their expres- 
sion was significantly enhanced in actively glycolytic breast cancer 
cells (Extended Data Fig. 6d). TKT is a major enzyme mediating non- 
oxidative PPP, whereas XDH and AMPD1 traditionally known to regu- 
late purine catabolism are found to be regulated by SRC-37*”°. Whether 
the switch in roles by these reversible enzymes XDH and AMPD1 
depend on tumour metabolic state needs further investigation. Similarly, 
[6-'3C] glucose isotope-tracing experiments also confirmed reduced 
levels of TKT products seduheptulose-7P (S7P) and erythrose-4P 
(E4P) upon PFKFB4 or SRC-3 knockdown (Extended Data Fig. 6e, f). 
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Fig. 3 | SRC-3 phosphorylation by PFKFB4 enhances gene expression 
of metabolic enzymes. a, Relative levels of metabolites altered by 
shRNAs against PFKFB4 or SRC-3 compared to control non-targeting 
shRNA in MDA-MB231 cells. n = 3 biologically independent samples. 
**P< 0.01, ***P< 0.001, ****P < 0.0001, two-way ANOVA with Tukey’s 
multiple comparisons test. b, Relative proliferation of MDA-MB-231 and 
MCF-7 cells 4 days after treatment with siRNA targeting GFP (control) 
or SRC-3 under the conditions indicated. Ade, adenine; Gua, guanine. 
n=5 biologically independent replicates. ****P < 0.0001, one-way 
ANOVA with Tukey’s multiple comparisons test. c, mRNA expression of 
metabolic enzymes TKT, XDH and AMPD1 in MDA-MB-231 cells after 
treatment with siRNAs targeting GFP (control), PFKFB4 or SRC-3. n=3 
biologically independent samples. *P <0.05, **P < 0.01, ***P < 0.001, 
***P < 0.0001, one-way ANOVA with Tukey’s multiple comparisons 
test. d, Immunoprecipitation (IP) of ATF4 from MDA-MB-231 cells 


To confirm that these genes are direct targets of SRC-3, we re-expressed 
SRC-3 in MDA-MB-231 cells with depleted levels of endogenous SRC-3 
protein (Extended Data Fig. 3g) and observed significant restoration of 
SRC-3 target genes (Extended Data Fig. 7a). The addition of exogenous 
purines also restored the primary growth defects in PFKFB4-silenced 
MDA-MB-231 cells (Extended Data Fig. 7b), with a decreased incorpo- 
ration of [U-%C]glucose carbon into purines (Extended Data Fig. 7c). 
Although the metabolic effects may or may not be directly regulated 
by target genes AMPD1 or XDH, our findings indicate that PFKFB4 
and SRC-3 mutually cooperate to drive glucose flux towards purine 
generation. 

To define how PFKFB4 phosphorylation of SRC-3 affects transcrip- 
tional regulation of the three commonly regulated purine biosynthesis 
genes defined above, we analysed the chromatin occupancy of SRC-3 
on the promoters of TKT, XDH and AMPD1 using an existing in 
silico analysis of SRC-3 chromatin immunoprecipitation followed by 
sequencing (ChIP-seq) dataset”®. We identified strong overlap of SRC-3 
occupancy with activating transcription factor 4 (ATF4)-binding sites” 
on the three target genes (Extended Data Fig. 8a, b). Interestingly, ATF4 
has been recently identified to promote purine synthesis in response 
to growth signals”®. To validate whether SRC-3 interacts with ATF4, 
we immunoprecipitated ATF4 from MDA-MB-231 cells growing in 
either 25mM or 5mM glucose. Under conditions of enhanced glyco- 
lysis, the interaction of ATF4 with pSRC-3-Ser857 increased robustly 
although the total ATF4 protein level was lower owing to reduced 
nutrient stress compared to 5mM glucose treatment. However, the loss 
of PFKFB4, SRC-3, or re-expression of SRC-3(Ser857Ala) in SRC-3- 
knockdown cancer cells greatly reduced the association (Fig. 3d). Next 
we performed ChIP and quantitative PCR (ChIP-qPCR) to measure 
the chromatin occupancy of ATF4, pSRC-3-Ser857 and SRC-3 on the 
target gene promoters. Breast cancer cells growing in the presence of 
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grown in 5mM or 25 mM glucose after treatment with PFKFB4 shRNA, 
SRC-3 shRNA, or non-targeting control shRNA, or after SRC-3 shRNA 
plus re-expression of SRC-3(Ser857Ala). Levels of pSRC-3-Ser857 
associated with ATF4 were detected by immunoblotting. IgG light chain 
conjugated to horseradish peroxidase (HRP) was used to probe ATF4 in 
immunoblotting. The ATF4 blot was exposed for shorter (light) and longer 
(dark) time points to visualize faint bands. e, f, ChIP of ATF4, total SRC-3, 
and pSRC-3-S857 followed by qPCR from MDA-MB-231 cells treated 
with 5mM or 25 mM glucose compared to an IgG isotype control. SRC-3 
antibodies were either from BD Biosciences or Cell Signaling Technology. 
TKT (e) and AMPD1 (f) expression are shown. n = 3 biologically 
independent samples used for ChIP. *P <0.05, **P < 0.01, ***P < 0.001, 
*** P< 0.0001, one-way ANOVA with Tukey’s multiple comparisons test 
compared to 5mM glucose groups. See Source Data for exact P values. 
Unless stated otherwise, data are mean +s.d. 


25mM glucose showed increased occupancy of ATF4 and pSRC-3- 
Ser857 on TKT (Fig. 3e), XDH (Extended Data Fig. 8c) and AMPD1 
(Fig. 3f) promoters, whereas the loss of SRC-3 or PFKFB4 signifi- 
cantly reduced ATF4 chromatin occupancy on AMPD1 (Extended 
Data Fig. 8d). In addition, we found SRC-3 recruitment to the gene 
promoters is dependent on ATF4, as knockdown of ATF4 significantly 
reduced target gene expression and pSRC-3-Ser857 promoter occu- 
pancy (Extended Data Fig. 8e, f). These findings demonstrate that in 
actively glycolytic breast cancers, PFKFB4-dependent phosphorylation 
of SRC-3 at Ser857 promotes interaction with the transcription factor 
ATF4, thereby stabilizing the complex on chromatin and driving 
transcription of key metabolic enzymes. 

To study whether suppression of PFKFB4 or SRC-3 can affect the 
growth of breast tumours in vivo (Fig. 4a), we implanted MDA-MB-231 
cells stably expressing non-targeting shRNA, SRC-3 shRNA (Extended 
Data Fig. 3g) and PFKFB4 shRNA (Fig. 2d) into the mammary fat pad 
of female nude mice. Compared to control mice, genetic loss of SRC-3 
or PFKFB4 exhibited substantially reduced tumour growth and volume 
(Fig. 4b and Extended Data Fig. 9a). Immunostaining with a human 
Ki67 antibody showed significantly reduced proliferative cells in 
SRC-3- or PFKFB4-ablated tumours compared to controls (Extended 
Data Fig. 9b, c). To evaluate the functional significance of the Ser857 
phosphorylation of SRC-3 in breast tumour progression, we stably 
expressed shRNA-resistant wild type or SRC-3(Ser857Ala) in 
MDA-MB-231 cells with suppressed expression of endogenous SRC-3 
protein (Extended Data Fig. 3g). Rescuing expression with the exog- 
enous wild-type SRC-3 construct in SRC-3-depleted cells completely 
restored the growth of the breast tumours (Extended Data Fig. 9d), 
whereas the phosphorylation-deficient Ser857 Ala mutant (Extended 
Data Fig. 9d, e) was partially resistant to tumorigenesis six weeks after 
grafting the tumour cells (Fig. 4b and Extended Data Fig. 9a). 
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Fig. 4 | Activation of the PFKFB4-SRC-3 axis drives breast tumour 
primary growth and metastasis. a~g, MDA-MB-231 cells stably 
expressing SRC-3 shRNA, PFKFB4 shRNA or SRC-3 shRNA plus wild- 
type (WT) SRC-3 or the SRC-3(Ser857Ala) mutant were injected into 
nude female mice. a, Schematics of the in vivo orthotopic xenograft 
experiment. Tumour cells were injected in the mammary fat pad (n=5 
mice) and after 6 weeks primary tumours were resected out and animals 
were monitored by bioluminescence. b, Tumour volume. ****P < 0.0001, 
one-way ANOVA with Tukey’s multiple comparisons test. n = 5. Boxes 
are as in Fig. 2e. c, Bioluminescence imaging of animals 4 weeks after 
surgery. Representative images of three animals are shown from n=5 
mice for wild-type SRC-3, SRC(Ser857Ala) and PFKFB4 shRNA; and 
n=4 mice for SRC-3 shRNA. Residual or recurrence tumours at primary 


After resecting out the primary tumours, we allowed the ani- 
mals to survive for four more weeks with weekly bioluminescence 
imaging (Fig. 4a) to evaluate metastatic potential. Animals with 
primary tumours expressing wild-type SRC-3 developed profound lung 
metastasis with morbid hunched back posture, whereas suppression 
of SRC-3 or PFKFB4 or expression of the SRC-3(Ser857Ala) 
phosphorylation-deficient mutant all showed markedly reduced lung 
lesions (Fig. 4c and Extended Data Fig. 9f). Pathological analysis iden- 
tified only a few micro-metastatic lesions in the lungs of animals with 
SRC-3(Ser857Ala), or SRC-3- and PFKFB4-ablated primary tumours 
(Fig. 4d), with no observed health issues during the four weeks 
after surgery. These findings demonstrate that SRC-3 and PFKFB4 
are drivers of basal-subtype breast tumour growth and that phospho- 
rylation of SRC-3 at the Ser857 site is crucial for metastatic progres- 
sion of the disease. Immunostaining of the primary tumours with a 
pSRC-3-Ser857 antibody detected increased nuclear-localized human 
SRC-3 in the tumours collected from wild-type animals that progressed 
to aggressive metastatic disease, whereas PFKFB4- or SRC-3-ablated 
tumours had significantly reduced nuclear staining (Fig. 4e, f). Nuclear- 
localized pSRC-3-Ser857 represents active SRC-3 in the tumour that in 
turn promotes target gene expression to maintain tumour growth and 
metastasis. Importantly, this single phosphorylation site modification 
was also found to be an indicator of tumour metastasis mediated by 
ERK3 in a previous study”®. Taken together, our data demonstrate that 
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sites were masked with black paper to visualize lung lesions. d, Histology 
images showing lung sections stained with haematoxylin and eosin. 
Arrows indicate micro-metastasis lesions. Scale bar, 100 1m. Data shown 
are representative of four fields per slide from n =5 animals per group. 
e, Immunohistochemical images from primary tumours demonstrating 
pSRC-3-Ser857 expression (red) co-stained with DAPI (blue). Scale 
bars, 100 j1m. Magnified image in the box shows the tumour boundary 
as indicated by the dotted line. Scale bar, 200 um. f, Quantification of 
nuclear-stained pSRC-3-Ser857 in each group. Average of four fields per 
slide from n=5 mice per group. ****P =0.0001, one-way ANOVA with 
Dunnett’s multiple comparisons test. See Source Data for exact P values. 
Unless stated otherwise, data are mean +s.d. 


the PFKFB4-SRC-3 signalling axis promotes tumour cell proliferation 
by increasing purine synthesis (Extended Data Fig. 9g), which may also 
serve as a critical determinant of metastatic progression of the disease. 

To identify the clinical implications of this axis, we first analysed 
expression of PFKFB4 in The Cancer Genome Atlas (TCGA) database 
and found its expression to be significantly enhanced across all sub- 
types of breast cancer (Extended Data Fig. 9h). Because SRC-3 is an ER 
coactivator, we analysed expression of pSRC-3-Ser857 and PFKFB4 in 
ER-positive primary breast tumours and adjacent normal tissues. Our 
data show increased levels of pSRC-3-Ser857, PFKFB4 and SRC-3 in 
most tumours compared to normal tissues (Extended Data Fig. 10a, b), 
and a significant correlation between pSRC-3-Ser857 and PFKFB4 
levels (r= 0.63, Extended Data Fig. 10c). Because PFKFB4 expression is 
also increased in other breast tumour subtypes, we performed protein 
array analyses using MDA-MB-231 cell lysates with suppressed expres- 
sion of SRC-3 or PFKFB4 protein, and compared the significantly 
altered protein targets to the control non-targeting shRNA. Our study 
identified a common proteomic signature by intersecting the significant 
proteins affected by ‘both SRC-3 and PFKFB4 knockdown (Extended 
Data Fig. 10d). Imposing the restriction of protein-changes in the ‘same 
direction’ we evaluated the correlation of the common PFKFB4-SRC-3 
proteomic signature with patient survival in a cohort of specimens 
from patients with breast cancer for which clinical information was 
available. We identified that the PFKFB4-SRC-3 common proteomic 
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signature also is associated with a decreased likelihood of survival in 
a basal-like-subtype triple-negative patient cohort (Extended Data 
Fig. 10e). These clinical associations are compatible with our in vivo 
experimental observations substantiating that the PFKFB4-SRC-3 axis 
is a molecular powerhouse that propels breast tumorigenesis leading it 
to an aggressive metastatic disease. 

Here we have uncovered an interaction between the glycolytic path- 
way and the oncogenic activation of the transcriptional coactivator 
SRC-3. The Warburg effect is known to be one of the most dominant 
sugar metabolic pathways across cancers generating energy and mac- 
romolecules to sustain rapid proliferation and tumour growth. We now 
find that a glycolytic stimulator, the bifunctional enzyme PFKFB4, also 
can operate as a protein kinase, at least in actively glycolytic tumours. 
After glucose uptake, PFKFB4 catalyses the synthesis of F2,6BP from 
F6P and ATP; and our study revealed that under these conditions, 
PFKFB4 can also phosphorylate SRC-3 at Ser857. Phosphorylation of 
SRC-3 at Ser857 rapidly increases its transcriptional activity and pro- 
motes the synthesis of genes for driving glucose flux towards purine 
synthesis (Extended Data Fig. 10f). The PFKFB4—-SRC-3 axis was found 
to be enriched in ER-positive breast tumours, and was also identified 
to promote a common proteomic signature that correlates with worse 
outcomes in patients with triple-negative breast cancer, thereby driving 
an aggressive metastatic disease (Extended Data Fig. 10g). Our work 
suggests that targeting the PFKFB4-SRC-3 axis may be therapeuti- 
cally valuable in breast tumours that are notably dependent on glucose 
metabolism. 
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METHODS 

Vectors and virus production. Commercially-available shRNAs targeting the 3’ 
UTR region of the PFKFB4 (TRCN0000199909-sh09 and TRCN0000199820-sh20) 
and SRC-3 (TRCN0000370321-sh21 and TRCN0000365196-sh96) were 
obtained from Sigma. Lentiviruses were produced by transient transfection using 
Lipofectamine 2000 (Life Technologies) into 293 T cells along with pMD2.G (a gift 
from D. Trono, Addgene plasmid 12259) and psPAX2 (a gift from D. Trono, 
Addgene plasmid 12260), and the viral supernatants were collected after 48h 
followed by precipitation and purification using PEG-it Virus Concentration 
Solution (System Bioscience)*’. The construct expressing the GAL4 responsive 
luciferase reporter (pG5-luc) was obtained from Promega, and the pBIND-SRC-3 
construct was generated by inserting an in-frame fusion between the GAL4 DNA- 
binding domain and the open reading frame of human SRC-3, as previously 
described”. The pBIND-SRC-3(S857A)*! and pBIND-SRC-3(S857E) mutant 
were generated using the QuikChange Lightening site-directed mutagenesis kit, 
as described earlier*'. The GST-SRC-3 fragment constructs were obtained by 
cloning portions of the SRC-3 in-frame with GST. The N terminus bHLH (amino 
acids 1-320), serine/threonine (S/T) (amino acids 321-580), RID (amino acids 
581-840), CID (amino acids 841-1080), and HAT (amino acids 1081-1421) 
domains were generated as previously described’. The expression plasmid 
encoding SRC-3 with a C-terminal Flag tag was cloned into pSG5-Flag (WT SRC-3), 
and the point mutation of serine 857 to alanine (Ser857Ala) was generated by 
site-directed mutagenesis of wild-type SRC-3 and GST-SRC-3-CID constructs!*”. 
The human PFKFB4 cDNA (NM_004567.3) was obtained from Origene 
(RC201573). The PFKFB4 mutants Gly46Ala, Pro48Ala, Gly51Ala, Arg230Ala 
and Arg238Ala were generated by site-directed mutagenesis. All constructs were 
verified by Sanger sequencing. The siGENOME siRNA against PFKFB4, SRC-3 
and ATF4 were obtained from Dharmacon. 

The shRNA sequences were as follows: shPFK#09 (TRCN0000199909): 5’-CC 

GGGCTGATTGGCTGCCACATTTCCTCGAGGAAATGTGGCAGCCAATCA 
GCTTTTTTG-3’; shPFK#20 (TRCN0000199820): 5'-CCGGGCGCAGCTCTTA 
GGTGTTCACCTCGAGGTGAACACCTAAGAGCTGCGCTTTTTTG-3’; shSR 
C-3#21 (TRCN0000370321): 5'-CCGGTGACACTGCACTAGGATTATTCTCGA 
GAATAATCCTAGTGCAGTGTCATTTTTG-3’; shSRC-3#96 (TRCN00003 
65196): 5‘- CCGGTTCCACCTCCTAGGGATATAACTCGAGTTATATCCCTA 
GGAGGTGGAATTTTTG-3’. 
Cell culture. HeLa, HEK293T, MDA-MB-231, MCE-7 and MCF-7-ERE-MAR-Luc 
cells were cultured in DMEM (Gibco) supplemented with 10% FBS; SK-BR-3 cells 
were grown in McCoy’s medium with 10% FBS; and MCF-10A cells in DMEM/ 
F12 (Gibco) supplemented with 5% horse serum, epidermal growth factor (EGF), 
hydrocortisone, cholera toxin and insulin. All cell lines were incubated at 37°C 
and 5% COb. Cell lines were obtained from ATCC, and maintained and yearly 
tested for mycoplasma contamination by the Tissue Culture Core, Baylor College 
of Medicine (BCM). 

Stable cells expressing shRNAs were generated by lentiviral transduction in pres- 
ence of polybrene (8 1g ml“). Polyclonal pooled populations of stable cells were 
selected in the presence of puromycin (11g ml!) for more than three passages 
before initiating any functional experiments. 

Human kinome library screen. A high-throughput RNAi screen was performed 
using the Stealth RNAi human kinase library (Life Technologies) targeting each 
636 human kinases with three individual siRNAs directed at different regions of the 
gene that were arrayed in twenty-four 96-well plates. To identify the kinases that 
modulate SRC-3 transcriptional activity, we reverse co-transfected HeLa cells with 
pBIND or pBIND-SRC-3 (2 ng per 96-well) along with pG5luc firefly-luciferase 
reporter (100 ng per 96-well), and control siRNA targeting GFP (siGFP) or siRNAs 
targeting kinases (40 nM). The mixture was incubated with 0.75 1l per 96-well 
of Lipofectamine 2000 for 20 min followed by addition of HeLa cell suspension 
(12,500 cells per 96-well) in complete growth medium (DMEM plus 10% FBS) on 
top. After 48 h of culture, plates were carefully washed with PBS and luminescence 
reading was recorded in luminometer (Berthold) using the Dual-Luciferase Assay 
System (Promega). Additional wells present on all plates had appropriate controls 
containing cells transfected with pBIND and siRNA targeting GFP (siGFP), or 
pBIND-SRC-3 and siGFP along with reporter plasmid. SRC-3 transcriptional 
activity was calculated by comparing the relative luciferase units (RLU) of pBIND- 
SRC-3 to pBIND readings transfected with siGFP. Firefly luciferase reading from 
each well was normalized to its Renilla reading (pBIND-vector backbone con- 
tains the Renilla luciferase gene) to adjust the variations in transfection efficiency. 
The fold change in SRC-3 activity upon suppression of kinases was calculated by 
comparing data to siGFP readings, followed by robust z-score analysis to identify 
kinases that either increase or decrease SRC-3 activity more than 2s.d. above or 
below control siGFP (pBIND-SRC-3 plus siGFP) treatment. Fold change values 
were converted to log, for each set of siRNA and then graphed in 3D plot. 

Cell proliferation assays. Cells were transfected with indicated siRNAs and 
were seeded at a density of 3,000 cells per 96-well in complete growth medium. 
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For rescue experiment, cells were seeded in complete growth medium supple- 
mented with dialysed serum with or without purines (10|1M adenosine, Sigma 
and 101M guanosine, Sigma). After 4 days, cells were stained with CellTiter96 
(Promega) reagent followed by measurement of absorbance at 490 nm. For the clo- 
nogenic survival assays, 1,000 cells per well were plated onto a 6-well plate, and were 
incubated for 7 days, and stained with crystal violet. The medium was changed 
every two days. 

In vitro phosphorylation assays. The full length SRC-3-Flag protein was 
expressed in Sf9 cells and purified using anti-Flag antibody beads'?. The SRC-3 
fragments were expressed as GST fusion proteins in Escherichia coli and puri- 
fied using a GST fusion protein purification kit (Life Technologies) following the 
manufacturer’s protocol. Each reaction of the in vitro phosphorylation assay was 
carried out with varying concentration of purified recombinant GST-PFKFB4 
protein (SignalChem) (0.1-1 1g) along with SRC-3 (0.25 1g) as substrate, cold 
ATP (0.2mM) or 5 Ci [y-°2P] ATP (Perkin Elmer), and 1 x kinase buffer (Cell 
Signaling) in a total volume of 30,11. The reaction was carried out at 30°C for 
30 min and then stopped by adding 10:1 of 4x SDS sample buffer. Proteins were 
resolved by SDS-PAGE gel, stained with Coomassie blue (Bio-Rad), and visualized 
by autoradiography or probed with anti-Ser/Thr antibody (BD Biosciences). For 
mass spectrometric (MS) identification of phosphorylation sites, the GST-SRC- 
3-CID protein was used as a substrate for the kinase reaction along with cold ATP 
and PFKFB4 enzyme followed by separation by SDS-PAGE and staining with 
Coomassie blue. Gel lanes were sliced into different bands and in-gel digested 
overnight at 37°C with trypsin. After digestion, peptides were extracted twice in 
200 il of acetonitrile with re-suspension in 2011 of 2% formic acid before second 
extraction, dried in a Savant SpeedVac, and dissolved in a 5% methanol, 0.1% 
formic acid solution. The samples were then subjected to mass spectrometry 
separation to detect phosphorylated residues. 

Cell culture treatment conditions, protein isolation and immunoblotting. For 
siRNA treatments, cells were lysed 72h after transfection. Stable cells were grown 
until 80% confluency before protein was extracted. During nutritional stress 
conditions, stable cells were cultured in complete medium until 80% confluency, 
followed by a brief starvation (3h) in glucose-free growth medium. Cells were 
then switched to glucose-free DMEM supplemented with 10% dialysed serum and 
5mM or 25 mM glucose, as indicated in the figures, for 24h before cells were lysed. 
For glucose withdrawal, cells were cultured in 25 mM for 24h and then switched 
to medium containing 5 mM glucose for an additional 6h. For FBP treatment, 
glucose-starved cells were pre-treated with 10|.M streptolysin O (Sigma, $5265) to 
permeabilize the cells, followed by the addition of FBP (Santa Cruz, sc-214805) as 
previously described!’. Immunoblotting was performed as previously described”. 
In brief, cells were lysed using NP-40 lysis buffer (Life Technologies) along with 
protease and phosphatase inhibitor cocktail (Millipore). Total protein was 
estimated using a BCA protein estimation kit (Pierce) and approximately 40 j1g 
of proteins were separated by 4-12% Bis-Tris gels (Life Technology) and 
electroblotted onto nitrocellulose membranes using the iBlot system (Life 
Technology). Blots were blocked for 2h at room temperature or overnight at 
4°C in 1 x TBS buffer (Biorad) supplemented with 0.1% Tween-20 (Sigma) and 
either 5% bovine serum albumin (BSA) or 5% non-fat dry milk (Biorad). Blots 
were incubated overnight at 4°C with primary antibody diluted into TBST con- 
taining 1% BSA or 5% non-fat dry milk. Blots were subsequently washed three 
times for 10 min in TBST and incubated with secondary antibody coupled to 
HRP (Promega). Blots were washed as previously described, reacted with ECL 
reagents (Thermo Fisher Scientific) and detected by chemi-luminescence (UVP 
Biospectrum). Semi-quantitative levels of each band were analysed by densitometry 
using UVP Vision Works LS software, and the relative values normalized to actin 
are indicated numerically under each lane. 

Antibodies used for immunoblotting in the study are: mouse monoclonal 
SRC-3 (611105, BD Biosciences), rabbit monoclonal SRC-3 (2126, Cell Signaling), 
Flag (F3165, Sigma-Aldrich), mouse phosphoSerine/Threonine (612548, BD 
Biosciences), rabbit PFKFB4 (137785 and 71622, Abcam), mouse PFKFB4 
(TA500809, Origene), rabbit monoclonal ATF4 (11815, Cell Signaling), and 
B-actin conjugated to HRP (A3854, Sigma-Aldrich). The phospho-SRC-3 (Ser857) 
rabbit monoclonal antibody was cell culture supernatant produced from hybri- 
doma generated by immunizing animals with a synthetic peptide containing 
phosphorylated Ser857 of human SRC-3. This antibody (Clone 10A6) was a gift 
from Cell Signaling Technology. 

Immunoprecipitations. 293 T cultured in 100mm dishes until 80% confluency 
was transfected with Flag~SRC-3 followed by infection with adenovirus PFKFB4 
(Signagen Laboratories). Twenty-four hours after infection, the medium was 
changed and cells were incubated overnight in different concentrations of glucose 
(5, 10, 15 and 25 mM) in glucose-free DMEM medium supplemented with 10% 
dialysed FBS. For MDA-MB-231 cells, stable cells expressing shRNAs targeting 
PFKFB4 or SRC-3 or expressing Ser857Ala in SRC-3-depleted cells were grown 
in 5mM or 25mM glucose. Cells were lysed in NP-40 lysis buffer (Invitrogen) 
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supplemented with protease and phosphatase inhibitor cocktail (Millipore). 
For co-immunoprecipitations, lysates were precleared with control Protein A/G 
Agarose beads (Pierce). Five hundred micrograms of protein were then used for 
pull-down assays using monoclonal anti-Flag (F3165, Sigma) or anti-phosho-Ser/ 
Thr antibody (BD Biosciences) overnight. The beads were then captured, washed 
and immunoprecipitated proteins were eluted and subjected to immunoblotting, 
along with 2% input sample run in parallel. For ATF4 pull-down, anti- ATF4 
(11815, Cell Signaling) was used at a 1:250 dilutions to pull down ATF4. 
Light-chain-specific anti-rabbit secondary antibody conjugated to HRP (Jackson 
Immunoresearch, 1:5,000) was used to detect ATF4 in immunoblotting following 
immunoprecipitation to avoid overlap with IgG-heavy chain 
Immunohistochemistry. Immunohistochemistry was performed as previ- 
ously described*”. Mouse monoclonal anti-human Ki-67 antibody MIB-1 (Dako) 
and rabbit monoclonal anti-phospho-SRC-3 (Ser857) (Cell Signaling) were used to 
stain the lung sections followed by anti-mouse or anti-rabbit Alexa-594 secondary 
antibody (Molecular Probes). 

Gene expression analyses. Total RNA was isolated from cancer cells or tumours 
using the RNeasy Kit (QIAGEN). Reverse transcription was carried out using a 
Superscript VILO cDNA synthesis kit (Invitrogen) according to the manufactur- 
er’s instructions. For gene expression analysis, qPCR was performed using the 
Taqman system (Roche) with sequence-specific primers and the Universal Probe 
Library (Roche). ACTB was used as an internal control. Melt curve analysis was 
performed to ensure that a single PCR product was produced in a given well. We 
used three biological replicates for each treatment group. Data were analysed using 
the comparative C, method (AAC). 

Targeted TCA, glycolysis, PPP and nucleotide synthesis, and intermediary 
metabolite analysis using liquid chromatography-mass spectrometry. Sample 
preparation for mass spectrometric analysis: the metabolome extraction method 
described earlier was used for the cell lines in this study***. In brief, cells were 
thawed at 4°C and subjected to freeze-thaw cycles in liquid nitrogen three times to 
rupture the cell membrane. Following this, 750 1l of ice-cold methanol:water (4:1) 
containing 2011 of spiked internal standard was added to each cell line. The cells 
were homogenized for 1 min (30s pulse twice) and mixed with 45011 of ice-cold 
chloroform and vortex mixed in a Multi-Tube Vortexer for 10 min. The resulting 
homogenate was mixed with 1501] of ice-cold water and vortexed again for 2 min. 
The homogenate was incubated at —20°C for 20 min and centrifuged at 4°C for 
10 min to partition the aqueous and organic layers. The aqueous and organic layers 
were separated and dried at 37°C for 45 min in an Automatic Environmental Speed 
Vac system (Thermo Fisher Scientific). The aqueous extract was reconstituted in 
500 1l of ice-cold methanol:water (50:50) and filtered through 3 kDa molecular 
filter (Amicon Ultracel-3K Membrane, Millipore Corporation) at 4°C for 90 min to 
remove proteins. The filtrate was dried at 37°C for 45 min in a speed vac and stored 
at —80°C until mass spectrometry analysis. Before mass spectrometry analysis, the 
dried extract was re-suspended in 100,11 of methanol:water (50:50) containing 0.1% 
formic acid and analysed using multiple reaction monitoring (MRM). 

Liquid chromatography—mass spectrometry HPLC analysis was performed 
using an Agilent 1290 series HPLC system equipped with a degasser, binary pump, 
thermostatted auto sampler and column oven (all from Agilent Technologies). 
The MRM-based measurement of relative metabolite levels were used for normal 
phase chromatographic separation. All samples were kept at 4°C, and 51] of the 
sample was used for analysis. 

Separation of TCA, glycolysis and PPP-associated metabolites. The normal 
phase chromatographic separation was also used for targeted identification 
of metabolites. This analysis used solvents containing water (solvent A), with 
solvent A modified by the addition of 5mM ammonium acetate (pH 9.9), and 
100% acetonitrile (solvent B). The binary pump flow rate was 0.2 ml min7! with 
a gradient spanning 80% B to 2% B over a 20-min period followed by 2% B 
to 80% B for a 5-min period and followed by 80% B for a 13-min time period. 
The flow rate was gradually increased during the separation from 0.2 ml min™! 
(0-20 min), 0.3 ml min™! (20.1-25 min), 0.35 ml min“! (25-30 min), 0.4mlmin~! 
(30-37.99 min) and finally set at 0.2 ml min“! (5 min). Metabolites were separated 
on a Luna Amino (NH2) column (41m, 100 A 2.1 x 150mm, Phenominex) that 
was maintained in a temperature-controlled chamber (37 °C). All the columns 
used in this study were washed and reconditioned after every 50 injections. Ten 
microlitres was injected and analysed using a 6495 QQQ triple quadrupole mass 
spectrometer (Agilent Technologies) coupled to a 1290 series HPLC system via 
selected reaction monitoring (SRM). Metabolites were measured using negative 
ionization mode with an electrospray ionization (ESI) voltage of —3,500 eV, respec- 
tively. Approximately 9-12 data points were acquired per detected metabolite. 

Separation of nucleotides. For measurement of nucleotides and deoxy-nucleotides 
before mass spectrometry analysis, the dried extract was suspended in 50 il of 
methanol:water (50:50) containing 0.1% formic acid. Samples were delivered to 
the MS via reverse phase chromatography using a RRHD SB-CN column (1.8 }1m, 
3.0 x 100mm, Agilent Technologies) at 300 ul min~!. The gradient spanned 


2% B to 98% B over a 15-min period followed by 98% B to 2% B for a 1-min 
period. The gradient was continued for a 4-min time period to re-equilibrate the 
column. Buffers A and B consisted of 0.1% formic acid in water and acetonitrile, 
respectively. 

Ten microlitres was injected and analysed using a 6495 QQQ triple quadrupole 
mass spectrometer (Agilent Technologies) coupled to a 1290 series HPLC system 
via SRM. Metabolites were measured using positive ionization mode with an ESI 
voltage of 4000 eV, respectively. Approximately 9-12 data points were acquired 
per detected metabolite. 

Isotope labelling and profiling by targeted mass spectrometry. Glucose labelled 
with [6-C] glucose and [U-!°C] glucose were purchased from Cambridge Isotope 
Laboratories. MDA-MB231 cells were grown in 10-cm dishes in regular medium 
until 80% confluence, followed by brief (3h) starvation and then addition of 25mM 
of [6-'3C] glucose supplemented with glucose-free DMEM medium with 10% 
dialysed FBS and 1% penicillin/streptomycin*’. For [U-'°C] glucose, cells were fed 
with steady-state isotope tracers for 48h and medium was replaced 2h before 
metabolome collection and/or isotope tracer addition. Culture medium was 
collected, cells were washed with PBS, counted, and snap-frozen in liquid nitrogen. 
Cells were scraped into a 0.5-ml mixture of 1:1 water:methanol, sonicated for 1 min 
(two 30-s pulses), and then mixed with 450 11 ice-cold chloroform. The resulting 
homogenate was then mixed with ice-cold water and vortexed again for 2 min. 
The homogenate was incubated at —20°C and centrifuged at 4°C for 10 min to 
partition the aqueous and organic layers. The aqueous and organic layers were 
combined and dried at 37°C for 45 min in an automatic Environmental Speed 
Vac system (Thermo Fisher Scientific). The extract was reconstituted in a 500 
solution of ice-cold methanol:water (1:1) and filtered through a 3-kDa molecular 
filter (Amicon Ultracel 3-kDa Membrane) at 4°C for 90 min to remove proteins. 
The filtrate was dried at 37°C for 45 min in a speed vacuum and stored at —80°C 
until mass spectrometry analysis. Before MS analysis, the dried extract was resus- 
pended in a 5011 solution of methanol:water (1:1) containing 0.1% formic acid, 
and then analysed using MRM. Ten microlitres was injected and analysed using a 
6490 QQQ triple quadrupole mass spectrometer (Agilent Technologies) coupled 
to a 1290 Series HPLC system via SRM. Metabolites were targeted in both positive 
and negative ion modes: the ESI voltage was 4,000 V in positive ion mode and 
-3,500 V in negative ion mode. Approximately 9-12 data points were acquired 
per detected metabolite. To target the TCA flux, the samples were delivered to 
the mass spectrometer via normal-phase chromatography using a Luna Amino 
column (41m, 100A 2.1 x 150mm). To target the fatty-acid flux, the samples 
were delivered to the mass spectrometer via reverse-phase chromatography using 
a Phenyl Hexyl column (31m, 100 A 2.1 x 150mm). For C-labelled experiments, 
SRM was performed for expected °C incorporation in various forms for targeted 
liquid chromatography-tandem mass spectrometry (LC-MS/MS). Mass isotopomer 
distribution (MID) was calculated and corrected for natural abundance. 
Proximity ligation assay. Interaction between endogenous SRC-3 and PFKFB4 
was detected using the PLA technique™ using Duolink In situ Red Starter Kit 
Mouse/Rabbit (UO92101, Sigma) according to manufacturer's instructions. In 
brief, MDA-MB-231 cells were seeded in a 35-mm glass bottom culture dish 
(P35G-0-14C, MatTek Corporation), and after reaching 80% confluency, cells were 
fixed followed by blocking for 1h using the Duolink Blocking Solution at 37°C. 
Cells were then incubated in presence of primary antibodies: SRC-3 (rabbit mono- 
clonal, Cell Signaling) and PFKFB4 (mouse monoclonal, Origene), either alone or 
in combination. After incubation, cells were washed and Duolink PLA PLUS and 
MINUS probes were added for 1h at 37°C. After washing off the unbound probes, 
cells were incubated first with the ligase enzyme followed by DNA polymerase 
enzyme to amplify the DNA circle. Finally, cells were mounted using Duolink 
In Situ Mounting Media with DAPI, and analysed by microscopy. Images were 
obtained using Zeiss Axio Observer Al inverted microscope with N-Achroplan 
100 x /1.25 oil lens, Zeiss MRC5 camera, and AxioVision Rel.4.8 software. 
Analysis of ATF4 and SRC-3 cistromes and motif analysis of ATF4-bound 
sequences. Owing to the lack of ATF4 and SRC-3 ChIP-seq datasets in breast can- 
cer cell lines, we compared an in-house SRC-3 ChIP-seq dataset of mouse liver?®, 
with previously published ATF4 ChIP-seq data in mouse embryonic fibroblasts’. 
Even though this comparison is less than ideal as SRC-3 and ATF4 ChIP-seq were 
performed in different tissues, the co-localization of SRC-3 and ATF4 cistromes 
even in different tissues, nevertheless, argues for an interplay between them, a find- 
ing subsequently confirmed by co-immunoprecipitation and ChIP-qPCR assay 
in human breast cancer cell lines. ATF4 binding motifs in the promoter regions 
of XDH, TKT and AMPD1 genes were discovered using the MISP (Motif-based 
Interval Screener with PSSM) toolbox in Galaxy Cistrome with a P value cut-off of 
0.005. The consensus ATF4-binding motif used as input is TGATGCAA. 

ChIP. The following antibodies were used for ChIP: SRC-3 (Cell Signaling or BD 
Biosciences), ATF4 (Santa Cruz C-20, and 11815 Cell Signaling), pSRC-3-S857 
(Cell Signaling), and rabbit IgG. ChIP assays were performed according to an EZ 
ChIP kit (Millipore) with some modification*. In brief, MDA-MB-231 cells were 
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grown in 15-cm dishes until 80% confluent. For glucose stimulation, cells were 
glucose-deprived for 3 h by incubating in glucose-free DMEM supplemented with 
10% FBS, followed by 4h stimulation with 5mM or 25 mM glucose. Cells were 
cross-linked in 1% formaldehyde and quenched with 125mM glycine. Chromatin 
was sheared by sonication using a Branson Sonifier, precleared with control IgG 
antibodies and agarose beads (Millipore), and then immunoprecipitated with 
IgG (control), SRC-3, pSRC-3-S857 and ATF4 antibodies. DNA fragments were 
eluted from beads followed by reverse-crosslinking and purified DNA was used in 
qPCR reactions using SYBR green (Applied Biosytems) to determine the promoter 
occupancy. Melt curve analysis was performed to verify all SYBR green reactions 
produced a single PCR product. 

Luciferase assays. Luciferase assays were performed from whole-cell lysates 
made in Cell Culture Lysis reagent (Promega) using the Luciferase Reporter Assay 
(Promega) and a Berthold 96-well plate reader. Luciferase values were normalized 
to the total protein level. 

Metabolomic phenotyping microarrays. Screening was performed using 96-well 
plate phenotype microarrays (Biolog) containing 88 different carbon substrates and 
5 nucleotides as the energy source”!. MCF10A cells were infected with adenovirus 
expressing GFP or SRC-3, and seeded at an initial density of 2 x 10* cells per well 
in triplicate. Biolog Redox Dye Mix MA was added to each well according to the 
manufacturer's instructions, and kinetic usage of the metabolites was monitored 
using the GEN III OmniLog ID System (Biolog). 

Human breast tumours. The breast tumours and adjoining normal tissue 
was obtained from the Lester and Sue Smith Breast Center at Baylor College 
of Medicine according to the Institutional Research Board approved protocol 
#H-7900. Whole-cell lysates from a total of 14 human breast tumours that are 
ER* primary tumours, along with matched normal tissues, were used to detect 
pSRC-3-Ser857, SRC-3, and PFKFB4 levels by immunoblotting. 

Determining a common PFKFB4-SRC-3 proteomic signature. Protein lysates 
from MDA-MB-231 cells stably expressing shRNAs targeting SRC-3 or PFKFB4 
were used for protein array analysis as described before*®. Expression of pro- 
teins significantly altered owing to the ablation of PFKFB4 and SRC-3 compared 
with non-targeting control shRNA were determined using a parametric t-test as 
implemented in the python (spicy) statistical system. Significance was assessed for 
P<0.05, fold change exceeding 1.25 x , and normalized signal levels exceeding 
200 U. A common proteomic signature was determined by intersecting the 
significant proteins affected by each treatment, and imposing the restriction that 
the protein changes are in the same direction. 

Association of the PFKFB4-SRC-3 proteomic signature in human basal breast 
cancer. We evaluated the association of the common PFKFB4-SRC-3 proteomic 
signature with patient survival in a cohort of primary basal breast cancer patient 
specimens collected by The Cancer Genome Atlas (TCGA) for which clinical infor- 
mation has been collected?’. We first subsetted the proteins measured using the 
array by TCGA. Next, for each protein in the PFKFB4-SRC-3 common proteomic 
signature and for each basal breast cancer specimen, we computed the z-score for 
its expression within the patient cohort. We then computed the sum of the z-scores 
for each specimen. Specifically, the z-scores of the proteins suppressed by PFKFB4- 
SRC-3 (that is, upregulated by PFKFB4 and SRC-3 shRNA) were subtracted from 
the z-scores of the proteins induced by PFKFB4 (that is, downregulated by PFKFB4 
and SRC-3 shRNA); this resulted in an activity score of the PFKFB4-SRC-3 
common proteomic signature, respectively, for each specimen. After comput- 
ing the activity scores, we further partitioned the patient cohort into specimens 
with a high activity score (top 33% of the specimens) and specimens with a low 
activity score (bottom 33% of the specimens) for the corresponding signatures. 
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We considered significant association with survival using the log-rank test 
(P< 0.05) and the Cox proportional hazard test (P< 0.05) available via the 
package survival as implemented in the R statistical system. 

Tumorigenicity and metastasis assays. All animal experiments were carried 
out in accordance with a protocol approved by the Baylor College of Medicine 
Institutional Animal Care and Use Committee and experiments were terminated 
once maximal tumour volumes were reached (10% of the animal body weight). 
MDA-MB-231 breast cancer cells stably expressing luciferase were individually 
transduced with shRNAs targeting SRC-3 and PFKFB4. For the rescue experiment, 
SRC-3-ablated tumour cells were used to restore the levels of either wild-type 
SRC-3 or the SRC-3(Ser857Ala) mutant, and the polyclonal pooled population was 
selected. Approximately 2.5 x 10° cells were injected at orthotopic site along with 
Matrigel (BD Biosciences) (1:1 volume) in the mammary fat pad of 5-6 week-old 
female athymic nude Foxn1-nu mice (Envigo). The mammary tumour length (L) 
and width (W) were measured with a caliper. Tumour volumes were calculated 
using the formula L x W? x 7/6. After six weeks, tumours were resected out by 
surgery and the animals were monitored for lung metastasis progression every 
week and quantified using noninvasive bioluminescence measurement with IVIS 
Lumina II equipment. Four weeks after tumour resection animals were euthanized 
and tissues were collected and fixed in 4% PFA. Paraffin-embedded lung samples 
were also subjected to haematoxylin and eosin staining to reveal the size and num- 
ber of lung macro or micro-metastases. The experiments were not randomized and 
the investigators were not blinded to allocation during experiments and outcome 
assessment. No statistical methods were used to predetermine sample size estimate. 
Statistics. Unless otherwise indicated, all results represent the mean + s.d., and 
statistical comparisons between different groups were performed using the 
two-tailed Student's t-test, one-way or two-way ANOVA with appropriate multiple 
comparisons corrections. For all statistical analyses, differences of P< 0.05 were 
considered statistically significant, and three biologically independent experi- 
ments with similar results are reported. GraphPad Prism software version 6.0/7.0 
(GraphPad Software) was used for data analysis. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The ChIP-seq data have been submitted to the Gene Expression 
Omnibus under accessions GSE35681 (for ATF4) and is GSE67860 (for SRC-3). 
Other data that support the findings of this study are available from the 
corresponding author upon reasonable request. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Kinome-wide screen identified potential 
kinases regulating SRC-3 intrinsic transcriptional activity. a, HeLa 

cells expressing varying concentrations of the pBIND or pBIND-SRC-3 
constructs were used to measure SRC-3 activity. n = 4 biologically 
independent samples. *P < 0.000001, one-way ANOVA with Sidak’s 
multiple comparison test. RLU are normalized by protein content. 

b, HeLa cells expressing pBIND or pBIND-SRC-3 were treated with siRNA 
targeting GFP or PRKCZ at the indicated dose followed by luciferase 

assay to measure SRC-3 activity. n = 3 biologically independent samples. 
*P < 0.000001, one-way ANOVA with Tukey’s multiple comparison test. 

c, Different control siRNAs targeting GFP or luciferase (Luc) were used to 
measure SRC-3 activity in HeLa cells expressing pBIND or pBIND-SRC-3. 
n= 3 biologically independent samples. The GFP control siRNAs in the 
red box were used in the library screen as controls. d, Effect on SRC-3 
transcriptional activity by three sets of siRNA (sets A, B and C) targeting 
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636 human kinases in HeLa cells. Effect of GFP control siRNA was set at 
1 (dotted line), the cut-off fold for increased activation was set at 2, and 
reduced activity at 0.75 following z-score analysis. n = 3 siRNAs per / 
kinase, n =6 siGFP per plate; total n = 1,908 (siRNAs targeting kinases) 
n= 144 (siGFP control) independent samples. e, SRC-3 activity in HeLa 
cells across 24 kinome-library plates in the presence of control siRNA 
targeting GFP. n =6 biologically independent replicates for each plate. 

f, A secondary screen was performed in HeLa cells to confirm the primary 
screen hits using a pooled siRNA targeting the kinases followed by SRC-3 
transcriptional activity. n = 3 biologically independent samples. Boxes are 
as in Fig. 2e. g, Relative proliferation of MDA-MB-231 cells 4 days after 
treatment with siRNAs targeting GFP (control), SRC-3 or the indicated 
kinases. n = 3 biological replicates. *P < 0.0001. two-way ANOVA with 
Dunnett’s multiple comparisons test. Unless stated otherwise, data are 
mean + s.d. 
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Extended Data Fig. 2 | PFKFB4, the top hit from the kinase screen, 
enhances the transcriptional activity of SRC-3. a, Effect of PFKFB4 
knockdown on SRC-3 transcriptional activity in various breast cancer cell 
lines. n=3 or n=4 (siGFP plus pBIND-SRC-3) biologically independent 
cells. *P < 0.000009, two-way ANOVA with Tukey’s multiple comparison 
test. b, SRC-3 transcriptional activity in MDA-MB-231 cells expressing 
shRNAs targeting PFKFB4 (#09 and #20) or non-targeting control, co- 
transfected with pBIND or pBIND-SRC-3. n=5, biological replicates. 

*P <0.0001, one-way ANOVA with Tukey’s multiple comparisons test. 

c, Protein expression of PFKFB4 or actin in MDA-MB-231 cells expressing 
shRNAs targeting PFKFB4. d, Expression of PFKFB4 and SRC-3 mRNA 
in indicated breast tumour cells after treatment with siRNAs targeting 
GFP control or PFKFB4. n= 4 or n= 3 biological replicates. See Source 
Data for exact P values. e, Expression of PFKFB4 and SRC-3 mRNA in 
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MDA-MB-231 cells transduced with adenoviruses expressing GFP or 
PFKFB4. n = 6 biologically independent cells. ***P < 0.000001, two-way 
ANOVA with Tukey’s multiple comparison test. f, Left, MDA-MB-231 cells 
were stained with specific antibodies against SRC-3 (rabbit) and PFKFB4 
(mouse) before proximity ligation assay (PLA). The PLA signals between 
endogenous SRC-3 and PFKFB4 are shown in the red channel, DAPI was 
used to stain the nuclei (blue) and the merge images show the overlay of 
the red and blue channels. Two representative fields from biologically 
independent experiments were shown from n=5. Right, control cells 
were stained with either one of the antibodies against SRC-3, PFKFB4 

or secondary antibody-conjugated with probes. Scale bars, 201m (left), 
401m (right). Data are representative of three biologically independent 
experiments with similar results, and are shown as mean + s.d. 
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Extended Data Fig. 3 | PFKFB4 functions as a protein kinase by 
phosphorylating SRC-3 at the Ser857 residue. a, In vitro PFKFB4 

kinase assay in the presence of purified SRC-3 protein, F6P, ATP and 
increasing concentration of recombinant PFKFB4 enzyme followed by 
SDS-PAGE. Immunoblotting with pSer/Thr antibody shows the level 

of phosphorylated SRC-3 protein. b, In vitro PFKFB4 kinase assay 

in presence of purified SRC-3 protein, PFKFB4 enzyme and varying 
concentrations of F6P and ATP followed by SDS-PAGE. Immunoblotting 
with pSer/Thr antibody shows the level of pSRC-3 protein. c, Coomassie 
blue stain showing the levels of GST-fused SRC-3 fragments used in in 
vitro kinase reactions performed in Fig. 2b. d, Proteomics analysis of in 
vitro kinase assay using the GST-SRC-3-CID fragment in the presence of 
PFKFB4 enzyme and ATP followed by mass spectrometric analyses. Mass 
spectrum shows the green phosphorylation peak. e, Proteomics analysis 
of an in vitro kinase assay using a Ser857Ala-mutated GST-SRC-3-CID 
protein in the presence of PFKFB4 enzyme and ATP, followed by mass 
spectrometric analyses. Mass spectrum failed to detect phosphorylation 
peaks in the Ser857Ala-mutated SRC-3-CID protein. f, Expression of 
PFKFB1, PFKFB2, PFKFB3 and PFKFB4 in MDA-MB-231 cells expressing 
shRNAs targeting PFKFB4 (#09 and #20). mRNA levels were normalized 
to internal housekeeping gene ACTB. n= 3 biological replicates. *P < 0.05, 
two-way ANOVA with Tukey’s multiple comparisons test. g, Protein levels 
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of pSRC-3-Ser857, total-SRC-3 and actin in MDA-MB-231 cells stably 
expressing non-targeting control shRNA, SRC-3 shRNA, or SRC-3 shRNA 
plus the shRNA-resistant Ser857Ala SRC-3 mutant (shSRC-3 + $857A) 

or SRC-3 shRNA plus wild-type SRC-3 (shSRC-3 + WT-SRC-3) cultured 
in 25 mM glucose. Protein bands were quantified by Image] after 
normalization to B-actin. h, MDA-MB-231 cells stably expressing non- 
targeting shRNA or shRNA targeting PFKFB4 were grown in the presence 
of 25 mM glucose or were glucose-starved for 4h followed by incubation 
with streptolysin O for 5 min. FBP (101M) was added to glucose-starved 
cells for an additional 1h, followed by cell lysis and immunoblotting. 
Protein bands were quantified by ImageJ after normalization to B-actin 
and the non-targeting shRNA lane was set to 1. i, Relative luciferase 
activity showing the transcriptional activity of SRC-3 in MDA-MB-231 
cells transduced with adenoviruses expressing GFP or PFKFB4 cultured in 
the presence of 5mM, 15 mM or 25 mM glucose. n= 6 (pBIND) and n=3 
(pBIND-SRC-3) biological cell samples. *P < 0.000001, two-way ANOVA 
with Tukey’s multiple comparisons test. Data in a—c, f-h are representative 
of three biologically independent experiments with similar results, and 

in d, e are representative of two biologically independent experiments each 
run with three different reactions all showing similar results and peptide 
coverage. Data are mean + s.d. 
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Extended Data Fig. 4 | Ser857 phosphorylation enhances SRC-3 
transcriptional activity. a, Relative luciferase activity showing the activity 
of wild-type SRC-3, and the Ser857Ala and Ser857Glu SRC-3 mutants in 
MDA-MB-231 cells transduced with lentivirus expressing non-targeting 
shRNA or PFKFB4 shRNA cultured in the presence of 5mM or 25mM 
glucose. n= 3 biological cell samples. *P < 0.000001, two-way ANOVA 
with Tukey’s multiple comparisons test. b, Relative luciferase activity 
showing the activity of SRC-3 in MDA-MB-231 cells stably expressing 
lentivirus PFKFB4 shRNA and cultured in the presence of 25 mM glucose. 
The cells are then co-transfected with empty vector, wild-type PFKFB4 
and PFKFB4 mutants Gly46Ala, Pro48Ala, Gly51Ala, Arg230Ala and 
Arg238Ala. n = 6 biological cell samples. *P < 0.000001, two-way ANOVA 
with Tukey’s multiple comparisons test. c, MDA-MB-231 cells stably 
expressing shRNAs targeting PFKFB4 (231-shPFKFB4) were transfected 
with constructs expressing empty vector (vector), wild-type PFKFB4, 

and PFKFB4 mutants Gly46Ala, Pro48Ala, Gly51Ala, Arg230Ala and 
Arg238Ala, and cultured in presence of 25 mM glucose. Protein levels of 
pSRC-3-Ser857, PFKFB4 and 6-actin were detected by immunoblotting. 
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d, Relative luciferase activity showing the activity of oestrogen receptor-a 
(ERaq) in MCF7-Mar-luc cells transduced with lentivirus expressing non- 
targeting shRNA or SRC-3 shRNA cultured in the presence of 5mM or 

25 mM glucose stimulated with 100 nm E2, or with ethanol control (—E2). 
n= 3 biological cell samples. *P < 0.000001, two-way ANOVA with 
Tukey’s multiple comparisons test. e, Relative luciferase activity showing 
the activity of ERa in MCF7-Mar-luc cells transduced with adenovirus 
expressing GFP or PFKFB4. Cells transduced with PFKFB4 adenovirus 
were infected with non-targeting shRNA or SRC-3 shRNA after 2 days 

and then cultured in the presence of 5mM or 25 mM glucose stimulated 
with ethanol (—E2) or with 100nM E2. n=3 biological cell samples. 

*P < 0.000001, two-way ANOVA with Tukey’s multiple comparisons test. 
f, Survival assay in MCF7 cells showing the effect of non-targeting shRNA, 
SRC-3 shRNA, and re-expression of wild-type SRC-3 or SRC-3(Ser857Ala) 
mutant in SRC-3-depleted cells cultured in charcoal-stripped medium 
supplemented with 25 mM glucose and E2 for 7 days. n = 3 biological 
independent data are shown. All data are representative of three 
independent experiments with similar results, and shown as mean + s.d. 
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Extended Data Fig. 5 | Increased glucose and purines are required for 
SRC-3-dependent growth. a, Real-time measurement of MCF10A cell 
proliferation transduced with adenoviruses expressing GFP or SRC-3 in 
the presence of 93 different metabolites. n =3 independent plates run 
for each sample. b, Relative growth of MCFIO0A cells transduced with 
adenoviruses expressing GFP or SRC-3 in the presence of p-glucose (b), 
adenosine (c) and inosine (d). n = 6 biological cell samples. **P < 0.01, 
**%* D < 0.001, ****P < 0.0001, unpaired t-test two tailed. Boxes are as in 
Fig. 2e. e, f, Relative levels of intermediary metabolites in MDA-MB-231 


cells after treatment with shRNAs targeting PFKFB4 or SRC-3 compared 
to control shRNA. e, Glycolytic and PPP metabolites. f, Nucleotides. 

n= 3 biological independent samples. *P < 0.05, two-way ANOVA with 
Tukey’s multiple comparisons test. g, Total levels of purines in MCF10A 
cells transduced with with adenoviruses expressing GFP or SRC-3. 

n= 3 biological independent samples. *P < 0.05, ***P < 0.001, two-way 
ANOVA with Tukey’s multiple comparisons test. See Source Data for exact 
P values. Unless stated otherwise, data are mean +s.d. 
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Extended Data Fig. 6 | SRC-3 drives the purine synthesis program cells transduced with adenovirus expressing GFP (control) and PFKFB4 
under conditions of active glycolysis. a, MDA-MB231 cells stably cultured in the presence of 5mM, 15mM or 25 mM glucose. n =3 
expressing control shRNA, PFKFB4 shRNA and SRC-3 shRNA were biological cell samples. **P < 0.01, ***P < 0.001, ****P =0.0001, two- 
fed with [6-'?C] glucose. Ribulose/xylulose-5P (m+ 1) labelling from way ANOVA with Dunnett’s multiple comparisons test. e, f, MDA-MB231 
[6-'3C] glucose is shown. n= 3 biological cell samples. ***P = 0.00013, cells stably expressing control shRNA, PFKFB4 shRNA and SRC-3 
7 P — (1.000078, one-way ANOVA with Tukey’s multiple comparisons shRNA were fed with [6-!3C] glucose. Seduheptulose-7P (m+ 1) (e) and 
test. b, Genes involved in oxidative and non-oxidative PPP. n=3 erythrose-4P (f) labelling from [6-'*C] glucose are shown. n = 3 biological 
biological cell samples. *P = 0.0431, two-way ANOVA with Sidak’s cell samples. **P < 0.01, ****P =0.001, two-way ANOVA with Dunnett’s 
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ANOVA with Sidak’s multiple comparisons test. d, mRNA expression otherwise, data are mean + s.d. 
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Extended Data Fig. 7 | Growth defect due to loss of SRC-3 or PFKFB4 is 
rescued by exogenous purines. a, Expression of the metabolic 

enzymes encoded by TKT, XDH, AMPD1 and SRC-3 in MDA-MB-231 

cells expressing control shRNA, SRC-3 shRNA or SRC-3 shRNA plus 
re-expression of shRNA-resistant wild-type SRC-3 protein (shSRC-3-21 + 
WT-SRC-3). n= 4 biological cell samples. *P< 0.05, **P< 0.01, ***P<0.001, 
“+ P = 0.0001, two-way ANOVA with Tukey’s multiple comparisons test. 
b, Relative proliferation of MDA-MB-231 cells expressing shRNA targeting 
SRC-3 (shSRC-3#01 and shSRC-3#02) or non-targeting control shRNA after 
treatment with siRNAs targeting luciferase (siLuc; as a control) or PFKFB4 


shPFKFB4- 
shSRC-3 | 


under the conditions indicated. n= 6 samples from biologically independent 
experiments. ****P < 0.000001, two-way ANOVA with Tukey’s multiple 
comparisons test. c, MDA-MB231 cells stably expressing control shRNA, 
PFKFB4 shRNA or SRC-3 shRNA were fed with [U-'°C]glucose for 48 h. 
Adenosine '°C-labelling from [U-*C] glucose is shown. n=3 samples from 
biologically independent experiments. one-way ANOVA with Tukey’s 
multiple comparisons test. Boxes are as in Fig. 2e. Data are representative of 
three biologically independent experiments with similar results. See Source 
Data for exact P values. Unless stated otherwise, data are mean +s.d. 
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Extended Data Fig. 8 | See next page for caption. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Fig. 8 | PFKFB4-SRC-3 stabilizes ATF4 transcription 
factor to promote purine synthesis. a, Chromatin localization peaks 

of SRC-3 and ATF4 on Tkt, Xdh and Ampd1 genes in mouse liver. 

b, ATF4-binding peaks are conserved on three SRC-3 target purine 
biosynthetic genes in both mouse and human genomes. c, Chromatin 
immunoprecipitation (ChIP) of ATF4, total SRC-3 and pSRC-3-Ser857 
from MDA-MB-231 cells treated with 5 mM or 25 mM glucose compared 
to an IgG isotype control. qPCR was performed to determine amount of 
promoter enrichment. d, ChIP-qPCR was performed from MDA-MB-231 
cells cultured in 25 mM glucose expressing SRC-3 shRNA, PFKFB4 shRNA 
or control shRNA. n=3 biological cell samples. *P < 0.01, **P < 0.0001, 
*** P< 0.00005, ****P < 0.000001, one-way ANOVA with Tukey’s 


multiple comparisons test compared to 5mM glucose groups (c) and 
compared to NT shRNA group (d). e, ChIP of ATF4, total SRC-3 

(BD Biosciences antibody), and pSRC-3-Ser857 from MDA-MB-231 cells 
on the AMPD1 promoter treated with non-targeting siRNA or siRNA 
against ATF4, and cultured in presence of 25 mM glucose compared to 

an IgG isotype control. qPCR was performed to determine the amount 
of promoter enrichment. n = 3 biological cell samples. ***P < 0.001, 
P< 0.000001, one-way ANOVA with Tukey’s multiple comparisons 
test. f, mRNA expression of TKT, XDH, AMPD1 and SRC-3 in 
MDA-MB-231 cells expressing siRNA targeting control or ATF4 siRNA. 
n= 3 biological cell samples. two-way ANOVA with Sidak’s multiple 
comparisons test. See Source Data for exact P values. Data are mean + s.d. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | The PFKFB4-SRC-3 axis promotes breast 
tumour growth and metastasis. a, Primary tumours resected out after 

6 weeks. b, Ki67 staining of primary tumours from animals injected 

with MDA-MB-231 cells stably expressing control shRNA, SRC-3 shRNA 
or PFKFB4 shRNA. Data are representative of five fields per slide from 
n=5 animals per group with similar findings. Scale bar, 100 jim. 

c, Quantification of Ki67-positive cells in the tumour. n = 5 animals per 
group, average of five fields counted from each slide. ****P = 0.0001, one- 
way ANOVA with Dunnett’s multiple comparisons test. d, Primary tumour 
growth in animals injected with MDA-MB-231 cells stably expressing 
shRNA targeting SRC-3, PFKFB4, or expression of wild-type SRC-3 or the 
Ser857Ala mutant in the SRC-3-depleted cells. n =5 animals per group. 
*P < 0.000001, two-way ANOVA with Tukey’s multiple comparisons 

test. e, Immunoblot showing the relative expression of SRC-3 in primary 
tumours from MDA-MB-231 cells stably expressing control shRNA, 
SRC-3 shRNA, or after re-expression of wild-type SRC-3 or the Ser857Ala 
mutant in the SRC-3-depleted cells. n =5 animals per group was pooled 
to generate the tumour lysate used for analysis. f, Graph representing the 


photon flux of animals from different groups. n =5 animals for wild-type 
SRC-3, the Ser857Ala mutant and PFKFB4 shRNA, and n = 4 animals for 
SRC-3 shRNA. *P < 0.0001, one-way ANOVA with Dunnett’s multiple 
comparisons test. Line shows median with range. g, mRNA expression 

of three metabolic enzymes (TKT, XDH and AMPD1), SRC-3 and 
PFKFB4 from the primary tumours. n= 5 animals per group. **P < 0.05, 
*** P< 0.001, ****P =0.0001, two-way ANOVA with Tukey’s multiple 
comparisons test. h, Expression of PFKFB4 in patients with breast cancer 
across different subtypes compared to normal breast tissue. Normal 

basal = 17; basal = 139; normal_Her2 = 9; Her2 = 67; normal luminal 
A=62; luminal A = 418; normal luminal B = 21 and LumB = 186. Line in 
the centre of the rectangle represents the median, top edge of the rectangle 
represents the third quartile, bottom edge of the rectangle represents the 
first quartile, top whisker represents the maximum and bottom whisker 
represents the minimum. All data are representative of three biologically 
independent experiments with similar results, and are shown as 

mean + s.d. unless otherwise stated. See Source Data for exact P values. 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 | The PFKFB4-SRC-3 axis drives transcriptional 
programming in patients with breast cancer. a, b, Expression of pSRC- 
3, SRC-3 and PFKFB4 in ER* breast tumour specimens and matched 
adjoining normal tissues as detected by immunoblotting. n = 14 patients 
with ER* breast cancer. c, Semi-quantitative levels of bands shown ina 
and b, analysed by densitometry using UVP Vision Works LS software, 
and normalized relative to actin to calculate the fold change (tumour/ 
normal) and plotted to obtain the correlation between PFKFB4 and 
pSRC-3-Ser857 expression. n = 14 normal and tumour tissues. R = 0.63, 
P=0.02 Spearman's rank correlation coefficient. d, log fold change in 
protein expression of the PFKFB4-SRC-3 signature compared to the 
control knockdown (non-targeting shRNA) as determined using a 
parametric t-test as implemented in the python (spicy) statistical system. 
Significance P < 0.05 and fold change exceeding 1.25 were used to 
classify true regulators of SRC-3 activity. n = 3 biologically independent 
samples. e, Kaplan-Meier survival plot showing poor survival of patients 


with breast cancer with basal subtype (triple-negative) disease exhibiting 
an increased expression of a common proteomic signature induced by 
the PFKFB4 and SRC-3 axis. The cohort of patients was collected by the 
TCGA. P=0.0365, log-rank test; P= 0.02971, Cox proportional hazards, 
two-sided. f, Cartoon model describing the crosstalk between glycolysis 
and purine generation highlighting the essential steps regulated by pSRC- 
3-Ser857. This PFKFB4-dependent SRC-3 phosphorylation enhances 
mRNA expression of genes involved in purine metabolism driving breast 
tumour growth, proliferation and metastasis. AICAR, 5-aminoimidazole- 
4-carboxamide ribonucleotide; AMP, adenosine monophosphate; F1,6-P, 
fructose 1,6 bisphosphate; IMP, inosine monophosphate. g, Model 
showing that, in glycolytic breast tumours, activated PFKFB4 drives SRC-3 
phosphorylation at Ser857, which then activates ER-positive primary 
tumour growth in conjunction with E2-liganded ER, as well in ER- 
negative/recurrent tumours in conjunction with ATF4, driving aggressive 
metastatic disease. Data are mean + s.d. 
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Fatal swine acute diarrhoea syndrome caused by an 
HKU2-related coronavirus of bat origin 


Peng Zhou!?, Hang Fan*, Tian Lan**-’, Xing-Lou Yang', Wei-Feng Shi°, Wei Zhang!, Yan Zhu!, Ya-Wei Zhang’, Qing-Mei 
Xie*+, Shailendra Mani°, Xiao-Shuang Zheng", Bei Li!, Jin-Man Li*, Hua Guo!, Guang-Qian Pei*, Xiao-Ping An’, Jun-Wei Chen*“, 
Ling Zhou?, Kai-Jie Mai*, Zi-Xian Wu*", Di Li*-+, Danielle E. Anderson®, Li-Biao Zhang’, Shi-Yue Li®, Zhi-Qiang Mi’, 
Tong-Tong He?, Feng Cong’, Peng-Ju Guo’, Ren Huang’, Yun Luo!, Xiang-Ling Liu!, Jing Chen!, Yong Huang’, Qiang Sun?, 
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Lin-Fa Wang°*, Zheng-Li Shi!*, Yi-Gang Tong?" & Jing-Yun Ma*** 


Cross-species transmission of viruses from wildlife animal 
reservoirs poses a marked threat to human and animal health’. 
Bats have been recognized as one of the most important reservoirs 
for emerging viruses and the transmission of a coronavirus 
that originated in bats to humans via intermediate hosts was 
responsible for the high-impact emerging zoonosis, severe acute 
respiratory syndrome (SARS)?!®. Here we provide virological, 
epidemiological, evolutionary and experimental evidence that 
a novel HKU2-related bat coronavirus, swine acute diarrhoea 
syndrome coronavirus (SADS-CoV), is the aetiological agent that 
was responsible for a large-scale outbreak of fatal disease in pigs in 
China that has caused the death of 24,693 piglets across four farms. 
Notably, the outbreak began in Guangdong province in the vicinity 
of the origin of the SARS pandemic. Furthermore, we identified 
SADS-related CoVs with 96-98% sequence identity in 9.8% (58 out 
of 591) of anal swabs collected from bats in Guangdong province 
during 2013-2016, predominantly in horseshoe bats (Rhinolophus 
spp.) that are known reservoirs of SARS-related CoVs. We found 
that there were striking similarities between the SADS and SARS 
outbreaks in geographical, temporal, ecological and aetiological 
settings. This study highlights the importance of identifying 
coronavirus diversity and distribution in bats to mitigate future 
outbreaks that could threaten livestock, public health and economic 
growth. 

The emergence of SARS in southern China in 2002, which was 
caused by a previously unknown coronavirus (SARS-CoV)'!"!° and 
has led to more than 8,000 human infections and 774 deaths (http:// 
www.who.int/csr/sars/en/), highlights two new frontiers in emerging 
infectious diseases. First, it demonstrates that coronaviruses are capable 
of causing fatal diseases in humans. Second, the identification of bats as 
the reservoir for SARS-related coronaviruses, and the fact that SARS- 
CoV*"° probably originated in bats, firmly establishes that bats are an 
important source of highly lethal zoonotic viruses, such as Hendra, 
Nipah, Ebola and Marburg viruses!®, 

Here we report on a series of fatal swine disease outbreaks in 
Guangdong province, China, approximately 100 km from the location 
of the purported index case of SARS. Most strikingly, we found that 
the causative agent of this swine acute diarrhoea syndrome (SADS) is 
a novel HKU2-related coronavirus that is 98.48% identical in genome 
sequence to a bat coronavirus, which we detected in 2016 in bats in a 
cave in the vicinity of the index pig farm. This new virus (SADS-CoV) 


originated from the same genus of horseshoe bats (Rhinolophus) as 
SARS-CoV. 

From 28 October 2016 onwards, a fatal swine disease outbreak was 
observed in a pig farm in Qingyuan, Guangdong province, China, 
very close to the location of the first known index case of SARS in 
2002, who lived in Foshan (Extended Data Fig. 1a). Porcine epidemic 
diarrhoea virus (PEDV, a coronavirus) had caused prior outbreaks at 
this farm, and was detected in the intestines of deceased piglets at the 
start of the outbreak. However, PEDV could no longer be detected in 
deceased piglets after 12 January 2017, despite accelerating mortality 
(Fig. 1a), and extensive testing for other common swine viruses yielded 
no results (Extended Data Table 1). These findings suggested that this 
was an outbreak of a novel disease. Clinical signs are similar to those 
caused by other known swine enteric coronaviruses’”'® and include 
severe and acute diarrhoea and acute vomiting, leading to death due 
to rapid weight loss in newborn piglets that are less than five days of 
age. Infected piglets died 2-6 days after disease onset, whereas infected 
sows suffered only mild diarrhoea and most sows recovered within 
two days. The disease caused no signs of febrile illness in piglets or 
sows. The mortality rate was as high as 90% in piglets that were five 
days or younger, whereas in piglets that were older than eight days, the 
mortality dropped to 5%. Subsequently, SADS-related outbreaks were 
found in three additional pig farms within 20-150 km of the index farm 
(Extended Data Fig. 1a) and, by 2 May 2017, the disease had caused 
the death of 24,693 piglets at these four farms (Fig. 1a). In farm A 
alone, 64% (4,659 out of 7,268) of all piglets that were born in February 
died. The outbreak has abated, and measures that were taken to control 
SADS included separation of sick sows and piglets from the rest of the 
herd. A qPCR test described below was used as the main diagnostic 
tool to confirm SADS-CoV infection. 

A sample collected from the small intestine of a diseased piglet was 
analysed by metagenomics analysis using next-generation sequenc- 
ing (NGS) to identify potential aetiological agents. Of the 15,256,565 
total reads obtained, 4,225 matched sequences of the bat CoV HKU2, 
which was first detected in Chinese horseshoe bats in Hong Kong and 
Guangdong province, China!’. By de novo assembly and targeted PCR, 
we obtained a 27,173-bp CoV genome that shared 95% sequence iden- 
tity to HKU2-CoV (GenBank accession number NC_009988). Thirty- 
three full genome sequences of SADS-CoV were subsequently obtained 
(8 from farm A, 5 from farm B, 11 from farm C and 9 from farm D) that 
were 99.9% identical to each other (Supplementary Table 1). 
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Fig. 1 | Detection of SADS-CoV infection in pigs in Guangdong, China. 
a, Records of daily death toll on the four farms from 28 October 2016 to 

2 May 2017. b, Detection of SADS-CoV by qPCR. The y axis shows the 
log(copy number per 10° copies of 18S rRNA). 1 = 12 sick piglets, 5 sick 
sows, 16 recovered sows and 10 healthy piglets. c, Tissue distribution of 
SADS-CoV in diseased pigs. n = 3. Data are mean + s.d.; dots represent 


Using qPCR targeting the nucleocapsid gene (Supplementary 
Table 2), we detected SADS-CoV in acutely sick piglets and sows, but 
not in recovered or healthy pigs on the four farms, nor in nearby farms 
that showed no evidence of SADS. The virus replicated to higher titres 
in piglets than in sows (Fig. 1b). SADS-CoV displayed tissue tropism 
of the small intestine (Fig. 1c), as observed for other swine enteric 
coronaviruses”’. Retrospective PCR analysis revealed that SADS-CoV 
was present on farm A during the PEDV epidemic, where the first 
strongly positive SADS-CoV sample was detected on 6 December 2016. 
From mid-January onwards, SADS-CoV was the dominant viral agent 
detected in diseased animals (Extended Data Fig. 1b). It is possible 
that the presence of PEDV early in the SADS-CoV outbreak may have 
somehow facilitated or enhanced spillover and amplification of SADS. 
However the fact that the vast majority of piglet mortality occurred 
after PEDV infection had become undetectable suggests that SADS- 
CoV itself causes a lethal infection in pigs that was responsible for these 
large-scale outbreaks, and that PEDV does not directly contribute to 
its severity in individual pigs. This was supported by the absence of 
PEDV and other known swine diarrhoea viruses during the peak and 
later phases of the SADS outbreaks in the four farms (Extended Data 
Table 1). 

We rapidly developed an antibody assay based on the $1 domain of 
the spike (S) protein using a luciferase immunoprecipitation system7". 
Because SADS occurs acutely and has a rapid onset in piglets, serolog- 
ical investigation was conducted only in sows. Among 46 recovered 
sows tested, 12 were seropositive for SADS-CoV within three weeks 
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individual values. d, Detection of SADS-CoV antibodies. n = 46 sows 
from whom serum was first taken in the first three weeks of the outbreak 
(First bleed), n = 8 sows from whom serum was taken again (Second 
bleed) at more than one month after the onset of the outbreak, n = 8 sera 
from healthy pig controls, n = 35 human sera from pig farmers. 


of infection (Fig. 1d). To investigate possible zoonotic transmission, 
serum samples from 35 farm workers who had close contact with sick 
pigs were also analysed using the same luciferase immunoprecipitation 
system approach and none were positive for SADS-CoV. 

Although the overall genome identity of SADS-CoV and HKU2- 
CoV is 95%, the S gene sequence identity is only 86%, suggesting that 
the previously reported HKU2-CoV is not the direct progenitor of 
SADS-CoV, but that they may have originated from a common ances- 
tor. To test this hypothesis, we developed a SADS-CoV-specific qPCR 
assay based on its RNA-dependent RNA polymerase (RdRp) gene 
(Supplementary Table 2) and screened 591 bat anal swabs collected 
between 2013 and 2016 from seven different locations in Guangdong 
province (Extended Data Fig. 1a). A total of 58 samples (9.8%) tested 
positive (Extended Data Table 2), all were from Rhinolophus spp. bats 
that are also the natural reservoir hosts of SARS-related coronavi- 
ruses*!°. Four complete genome sequences with the highest RdRp 
PCR-fragment sequence identity to that of SADS-CoV were deter- 
mined by NGS. They are very similar in size (27.2 kb) compared to 
SADS-CoV (Fig. 2a) and we tentatively call them SADS-related corona- 
viruses (SADSr-CoV). Overall sequence identity between SADSr-CoV 
and SADS-CoV ranges from 96 to 98%. Most importantly, the S protein 
of SADS-CoV shared more than 98% sequence identity with sequences 
of two of the SADSr-CoVs (samples 162149 and 141388), compared to 
86% with HKU2-CoV. The major sequence differences among the four 
SADSr-CoV genomes were found in the predicted coding regions of 
the Sand NS7a and NS7b genes (Fig. 2a). In addition, the coding region 
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a SADS-CoV: 27,173 bp NS3a NS7a_ b 
NS7b | 
S NS7a 0.09 


162140-CoV: 27,177 bp, 98.48% identity 


141388-CoV: 27,174 bp, 98.05% identity 


8462-CoV: 27,200 bp, 96.36% identity 


8495-CoV: 27,198 bp, 96.28% identity 


HKU2-CoV-GD: 27,165 bp, 95.09% identity 


Fig. 2 | Genome and phylogenetic analysis of SADS-CoV and SADSr- 
CoV. a, Genome organization and comparison. Colour-coding for different 
genomic regions as follows. Green, non-structural polyproteins ORFla and 
ORF 10; yellow, structural proteins S, E, M and N; blue, accessory proteins 
NS3a, NS7a and NS7b; Orange, untranslated regions. The level of sequence 
identity of SADSr-CoV to SADS-CoV is illustrated by different patterns 


of the S protein N-terminal (S1) domain was determined from 19 bat 
SADSr-CoVs to enable more detailed phylogenetic analysis. 

The phylogeny of S1 and the full-length genome revealed a high 
genetic diversity of alphacoronaviruses among bats and strong coev- 
olutionary relationships with their hosts (Fig. 2b and Extended Data 
Fig. 2), and showed that SADS-CoVs were more closely related to 
SADSr-CoVs from Rhinolophus affinis than from Rhinolophus sinicus, 
in which HKU2-CoV was found. Both phylogenetic and haplotype net- 
work analyses demonstrated that the viruses from the four farms proba- 
bly originated from their reservoir hosts independently (Extended Data 
Fig. 3), and that a few viruses might have undergone further genetic 
recombination (Extended Data Fig. 4). However, molecular clock 
analysis of the 33 SADS-CoV genome sequences failed to establish a 
positive association between sequence divergence and sampling date. 
Therefore, we speculate that either the virus was introduced into pigs 
from bats multiple times, or that the virus was introduced into pigs 
once, but subsequent genetic recombination disturbed the molecular 
clock. 

For viral isolation, we tried to culture the virus in a variety of cell lines 
(see Methods for details) using intestinal tissue homogenates as starting 
material. Cytopathogenic effects were observed in Vero cells only after 
five passages (Extended Data Fig. 5a, b). The identity of SADS-CoV was 
verified in Vero cells by immunofluorescence microscopy (Extended 
Data Fig. 5c, d) and by whole-genome sequencing (GenBank accession 
number MG557844). Similar results were obtained by other groups”. 

Known coronavirus host cell receptors include angiotensin- 
converting enzyme 2 (ACE2) for SARS-related CoV, aminopeptidase N 
(APN) for certain alphacoronaviruses, such as human (H)CoV-229E, 
and dipeptidyl peptidase 4 (DPP4) for Middle East respiratory syn- 
drome (MERS)-CoV~**, To investigate the receptor usage of SADS- 
CoV, we tested live or pseudotyped SADS-CoV infection on HeLa 
cells that expressed each of the three molecules. Whereas the positive 
control worked for SARS-related CoV and MERS-CoV pseudoviruses, 
we found no evidence of enhanced infection or entry for SADS-CoV, 
suggesting that none of these receptors functions as a receptor for virus 
entry for SADS-CoV (Extended Data Table 3). 

To fulfill Koch’s postulates for SADS-CoV, two different types of 
animal challenge experiments were conducted (see Methods for 


== Rhinolophus affinis 
== Rhinolophus sinicus 
== Rhinolophus rex 

= Pig 


of boxes: Solid colour, highly similar; Dotted fill, moderately similar; 
Dashed fill, least similar. b, Phylogenetic analysis of 57 S1 sequences 

(33 from SADS-CoV and 24 from SADSr-CoV). Different colours 
represent different host species as shown on the left. Scale bar, nucleotide 
substitutions per site. 


details). The first challenge experiment was conducted with specific 
pathogen-free piglets that were infected with a tissue homogenate of 
SADS-CoV-positive intestines. Two days after infection, 3 out of 7 
animals died in the challenge group whereas 4 out of 5 survived in 
the control group. Incidentally, the one piglet that died in the control 
group was the only individual that did not receive colostrum due to a 
shortage in the supply. It is thus highly likely that lack of nursing and 


Fig. 3 | Immunohistopathology of SADS-CoV infected tissues. 

a-d, Sections of jejunum tissue from control (a, ¢c) and infected (b, d) farm 
piglets four days after inoculation were stained with haematoxylin and 
eosin (a, b) or rabbit anti-SADSr-CoV N serum (red), DAPI (blue) and 
mouse antibodies against epithelial cell markers cytokeratin 8, 18 and 19 
(green) in (c, d). SADS-CoV N protein is evident in epithelial cells and 
deeper in the tissue of infected piglets, which exhibit villus shortening. 
Scale bars, 200 1m (a, b) and 50 jum (c, d). The experiment was conducted 
three times independently with similar results. 


12 APRIL 2018 | VOL 556 | NATURE | 257 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


inability to access colostrum was responsible for the death (Extended 
Data Table 4). For the second challenge, healthy piglets were acquired 
from a farm in Guangdong that had been free of diarrheal disease for 
a number of weeks before the experiment, and were infected with the 
cultured isolate of SADS-CoV or tissue-culture medium as control. 
Of those inoculated with SADS-CoV, 50% (3 out of 6) died between 
2 and 4 days after infection, whereas all control animals survived 
(Extended Data Table 5). All animals in the infected group suffered 
watery diarrhoea, rapid weight loss and intestinal lesions (determined 
after euthanasia upon experiment termination, Extended Data Tables 4, 
5). Histopathological examination revealed marked villus atrophy in 
SADS-CoV inoculated farm piglets four days after inoculation but not 
in control piglets (Fig. 3a, b) and viral N protein-specific staining was 
observed mainly in small intestine epithelial cells of the inoculated 
piglets (Fig. 3c, d). 

The current study highlights the value of proactive viral discovery 
in wildlife, and targeted surveillance in response to an emerging infec- 
tious disease event, as well as the disproportionate importance of bats 
as reservoirs of viruses that threaten veterinary and public health’. It 
also demonstrates that by using modern technological platforms, such 
as NGS, luciferase immunoprecipitation system serology and phyloge- 
netic analysis, key experiments that traditionally rely on the isolation of 
live virus can be performed rapidly before virus isolation. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0004-7. 
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METHODS 


Sample collection. Bats were captured and sampled in their natural habitat in 
Guangdong province (Extended Data Fig. 1) as described previously*. Faecal swab 
samples were collected in viral transport medium (VTM) composed of Hank’s bal- 
anced salt solution at pH 7.4 containing BSA (1%), amphotericin (15 jg ml~!), pen- 
icillin G (100 units ml~') and streptomycin (50 Lg ml~!). Stool samples from sick 
pigs were collected in VTM. When appropriate and feasible, intestinal samples were 
also taken from deceased animals. Samples were aliquoted and stored at -80 °C 
until use. Blood samples were collected from recovered sows and workers on the 
farms who had close contact with sick pigs. Serum was separated by centrifugation 
at 3,000g for 15 min within 24 h of collection and preserved at 4 °C. Human serum 
collection was approved by the Medical Ethics Committee of the Wuhan School of 
Public Health, Wuhan University and Hummingbird IRB. Human, pigs and bats 
were sampled without gender or age preference unless indicated (for example, 
piglets or sows). No statistical methods were used to predetermine sample size. 
Virus isolation. The following cells were used for virus isolation in this study: Vero 
(cultured in DMEM and 10% FBS); Rhinolophus sinicus primary or immortalized 
cells generated in our laboratory (all cultured in DMEM/F12 and 15% FBS): kidney 
primary cells (RsKi9409), lung primary cells (RsLu4323), lung immortalized cells 
(RsLuT), brain immortalized cells (RsBrT) and heart immortalized cells (RsHeT); 
and swine cell lines: two intestinal porcine enterocytes cell lines, IPEC (RPMI11640 
and 10% FBS) and SIEC (DMEM and 10% FBS), three kidney cell lines PK15, 
LLC-PK1 (DMEM and 10% FBS for both) and IBRS (MEM and 10% FBS), and 
one pig testes cell line, ST (DMEM and 10% FBS). All cell lines were tested free of 
mycoplasma contamination, species were confirmed and authenticated by micro- 
scopic morphologic evaluation. None of the cell lines was on the list of commonly 
misidentified cell lines (by the ICLAC). 

Cultured cell monolayers were maintained in their respective medium. PCR- 

positive pig faecal samples or the supernatant from homogenized pig intestine 
(in 200 jsl VTM) were spun at 8,000g for 15 min, filtered and diluted 1:2 with 
DMEM supplemented with 16 ,.g ml“! trypsin before addition to the cells. After 
incubation at 37 °C for 1 h, the inoculum was removed and replaced with fresh 
culture medium containing antibiotics (below) and 16 1g ml“! trypsin. The cells 
were incubated at 37 °C and observed daily for cytopathic effect (CPE). Four blind 
passages (three-day interval between every passage) were performed for each sam- 
ple. After each passage, both the culture supernatant and cell pellet were exam- 
ined for the presence of virus by RT-PCR using the SADS-CoV primers listed in 
Supplementary Table 2. Penicillin (100 units ml“) and streptomycin (15 jg ml!) 
were included in all tissue culture media. 
RNA extraction, $1 gene amplification and qPCR. Whenever commercial kits 
were used, the manufacturer’s instructions were followed without modification. 
RNA was extracted from 200 11 of swab samples (bat), faeces or homogenized 
intestine (pig) with the High Pure Viral RNA Kit (Roche). RNA was eluted in 
50 jl of elution buffer and used as the template for RT-PCR. Reverse transcription 
was performed using the SuperScript III kit (Thermo Fisher Scientific). 

To amplify S1 genes from bat samples, nested PCR was performed with prim- 
ers designed based on HKU2-CoV (GenBank accession number NC_009988.1)!” 
(Supplementary Table 2). The 25-11 first-round PCR mixture contained 2.5 jl 
10x PCR reaction buffer, 5 pmol of each primer, 50 mM MgCh, 0.5 mM dNTP, 
0.1 jul Platinum Taq Enzyme (Thermo Fisher Scientific) and 1 ul cDNA. The 50-11 
second-round PCR mixture was identical to the first-round PCR mixture except 
for the primers. Amplification of both rounds was performed as follows: 94 °C for 
5 min followed by 60 cycles at 94 °C for 30 s, 50 °C for 40 s, 72 °C for 2.5 min, anda 
final extension at 72 °C for 10 min. PCR products were gel-purified and sequenced. 

For qPCR analysis, primers based on SADS-CoV RdRp and N genes were used 
(Supplementary Table 2). RNA extracted from above was reverse-transcribed using 
PrimeScript RT Master Mix (Takara). The 10 jl qPCR reaction mix contained 
5 jul 2x SYBR premix Ex TaqII (Takara), 0.4 1M of each primer and 1 jl cDNA. 
Amplification was performed as follows: 95 °C for 30 s followed by 40 cycles at 
95 °C for 5 s, 60 °C for 30 s, and a melting curve step. 

Luciferase immunoprecipitation system assay. The SADS-CoV S1 gene was 
codon-optimized for eukaryotic expression, synthesized (GenScript) and cloned 
in frame with the Renilla luciferase gene (Rluc) and a Flag tag in the pREN2 
vector”!, pREN2-S1 plasmids were transfected into Cos-1 cells using Lipofectamine 
2000 (Thermo Fisher Scientific). At 48 h post-transfection, cells were collected, 
lysed and a luciferase assay was performed to determine Rluc expression for both 
the empty vector (pREN2) and the pREN2-S1 construct. For testing of unknown 
pig or human serum samples, 1 jl of serum was incubated with 10 million units of 
Rluc alone (vector) or Rluc-S1, respectively, together with 3.5 jul of a 30% protein 
A/G UltraLink resin suspension (Pierce, Thermo Fisher Scientific). After extensive 
washing to remove unbounded luciferase-tagged antigens, the captured luciferase 
amount was determined using the commercial luciferase substrate kit (Promega). 
The ratio of Rluc-S1:Rluc (vector) was used to determine the specific S1 reactivity 
of pig and human sera. Commercial Flag antibody (Thermo Fisher Scientific) 
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was used as the positive control, and various pig sera (from uninfected animals 
in China or Singapore; or pigs infected with PEDV, TGEV or Nipah virus) were 
used as a negative control. 

Protein expression and antibody production. The N gene from SADSr-CoV 3755 
(GenBank accession number MF094702), which shares a 98% amino acid sequence 
identity to the SADS-CoV N protein, was inserted into pET-28a+ (Novagen) for 
prokaryotic expression. Transformed Escherichia coli were grown at 37 °C for 12-18h 
in medium containing 1 mM IPTG. Bacteria were collected by centrifugation and 
resuspended in 30 ml of 5 mM imidazole and lysed by sonication. The lysate, 
from which N protein expression was confirmed with an anti-His-tag antibody, 
was applied to Ni*+ resin (Thermo Fisher Scientific). The purified N protein, at a 
concentration of 400 jg ml”, was used to immunize rabbits for antibody produc- 
tion following published methods”’. After immunization and two boosts, rabbits 
were euthanized and sera were collected. Rabbit anti-N protein serum was used 
1:10,000 for subsequent western blots. 

Amplification, cloning and expression of human and swine genes. Construction 
of expression clones for human ACE2 in pcDNA3.1 has been described 
previously>”®. Human DPP4 was amplified from human cell lines. Human APN 
(also known as ANPEP) was commercially synthesized. Swine APN (also known 
as ANPEP), DPP4 and ACE2 were amplified from piglet intestine. Full-length 
gene fragments were amplified using specific primers (provided upon request). 
Human ACE2 was cloned into pCDNA3.1 fused with a His tag. Human APN and 
DPP4, swine APN, DPP4 and ACE2 were cloned into pCAGGS fused with an S tag. 
Purified plasmids were transfected into HeLa cells. After 24 h, expression human 
or swine genes in HeLa cells was confirmed by immunofluorescence assay using 
mouse anti-His tag or mouse anti-S tag monoclonal antibodies (produced in house) 
followed by Cy3-labelled goat anti-mouse/rabbit IgG (Proteintech Group). 
Pseudovirus preparation. The codon-humanized S genes of SADS-CoV or MERS- 
CoV cloned into pcDNA3.1 were used for pseudovirus construction as described 
previously>”*. In brief, 15 jug of each pHIV-Luc plasmid (pNL4.3.Luc.R-E-Luc) 
and the S-protein-expressing plasmid (or empty vector control) were co-trans- 
fected into 4 x 10° HEK293T cells using Lipofectamine 3000 (Thermo Fisher 
Scientific). After 4 h, the medium was replaced with fresh medium. Supernatants 
were collected 48 h after transfection and clarified by centrifugation at 3,000g, then 
passed through a 0.45-,1m filter (Millipore). The filtered supernatants were stored 
at —80 °C in aliquots until use. To evaluate the incorporation of S proteins into the 
core of HIV virions, pseudoviruses in supernatant (20 ml) were concentrated by 
ultracentrifugation through a 20% sucrose cushion (5 ml) at 80,000g for 90 min 
using a SW41 rotor (Beckman). Pelleted pseudoviruses were dissolved in 50 jl 
phosphate-buffered saline (PBS) and examined by electron microscopy. 
Pseudovirus infection. HeLa cells transiently expressing APN, ACE2 or DPP4 
were prepared using Lipofectamine 2000 (Thermo Fisher Scientific). Pseudoviruses 
prepared above were added to HeLa cells overexpressing APN, ACE2 or DPP4 
24h after transfection. The unabsorbed viruses were removed and replaced with 
fresh medium at 3 h after infection. The infection was monitored by measuring 
the luciferase activity conferred by the reporter gene carried by the pseudovirus, 
using the Luciferase Assay System (Promega) as follows: cells were lysed 48 h after 
infection, and 20 1l of the lysates was taken for determining luciferase activity after 
the addition of 50 il of luciferase substrate. 

Examination of known CoV receptors for SADS-CoV entry/infection. 
HeLa cells transiently expressing APN, ACE2 or DPP4 were prepared using 
Lipofectamine 2000 (Thermo Fisher Scientific) in a 96-well plate, with mock- 
transfected cells as controls. SADS-CoV grown in Vero cells was used to infect 
HeLa cells transiently expressing APN, ACE2 or DPP4. The inoculum was 
removed after 1 h of absorption and washed twice with PBS and supplemented with 
medium. SARS-related-CoV WIV16’ and MERS-CoV HIV-pseudovirus were used 
as positive control for human/swine ACE2 or human/swine DPP4, respectively. 
After 24 h of infection, cells were washed with PBS and fixed with 4% formalde- 
hyde in PBS (pH 7.4) for 20 min at room temperature. SARS-related-CoV WIV16 
replication was detected using rabbit antibody against the SARS-related-CoV 
Rp3 N protein (made in house, 1:100) followed by Cy3-conjugated goat anti-rab- 
bit IgG (1:50, Proteintech)’. SADS-CoV replication was monitored using rabbit 
antibody against the SADSr-CoV 3755 N protein (made in house, 1:50) followed by 
FITC-conjugated goat anti-rabbit IgG (1:50, Proteintech). Nuclei were stained with 
DAPI (Beyotime). Staining patterns were examined using confocal microscopy on 
a FV1200 microscope (Olympus). Infection of MERS-CoV HIV-pseudovirus was 
monitored by luciferase 48 h after infection. 

High-throughput sequencing, pathogen screening and genome assembly. Tissue 
from the small intestine of deceased pigs was homogenized and filtered through 
0.45-,1m filters before nucleic acid extraction and ribosomal RNA was depleted 
using the NEBNext rRNA Depletion Kit (New England Biolabs). Metagenomics 
analysis of both RNA and DNA viruses was performed. For RNA virus screening, 
the sequencing library was constructed using Ion Total RNA-Seq Kit v2 (Thermo 
Fisher Scientific). For DNA virus screening, NEBNext Fast DNA Fragmentation 
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& Library Prep Set for Ion Torrent (New England Biolabs) was used for library 
preparation. Both libraries were sequenced on an Ion S5 sequencer (Thermo 
Fisher Scientific). An analysis pipeline was applied to the sequencing data, which 
included the following analysis steps: (1) raw data quality filtering; (2) host genomic 
sequence filtering; (3) BLASTn search against the virus nucleotide database using 
BLAST; (4) BLASTx search against the virus protein database using DIAMOND 
v.0.9.0; (5) contig assembling and BLASTx search against the virus protein data- 
base. For whole viral genome sequencing, amplicon primers (provided upon 
request) were designed using the Thermo Fisher Scientific online tool with the 
HKU2-CoV and the SADS-CoV farm A genomes as references, and the sequenc- 
ing libraries were constructed using NEBNext Ultra II DNA Library Prep Kit for 
Illumina and sequenced on an MiSeq sequencer. PCR and Sanger sequencing was 
performed to fill gaps in the genome. Genome sequences were assembled using 
CLC Genomic Workbench v.9.0. 5’-RACE was performed to determine the 5’-end 
of the genomes using SMARTer RACE 5’/3’ Kit (Takara). Genomes were annotated 
using Clone Manager Professional Suite 8 (Sci-Ed Software). 

Phylogenetic analysis. SADS-CoV genome sequences and other representative 
coronavirus sequences (obtained from GenBank) were aligned using MAFFT 
v.7.221. Phylogenetic analyses with full-length genome, S gene and RdRp 
were performed using MrBayes v.3.2. Markov chain Monte Carlo was run for 
20-50 million steps using the GTR+G-+I model (general time reversible model 
of nucleotide substitution with a proportion of invariant sites and y-distributed 
rates among sites). The first 10% was removed as burn-in. The association between 
phylogenies and phenotypes (for example, host species and farms) was assessed by 
BaTS beta-build2, with the trees obtained in the previous step used as input. For 
SADS-CoVs, a median-joining network analysis was performed using PopART 
v.1.7, with ¢ = 0. Phylogenetic analysis of the 33 full-length SADS-CoV genome 
sequences was performed using RAxML v.8.2.11, with GTRGAMMA as the nucle- 
otide substitution model and 1,000 bootstrap replicates. The maximum likelihood 
tree was used to test the molecular clock using TempEst v.1.5. Potential genetic 
recombination events in our datasets were detected using RDP v.4.72. 

Animal infection studies. Experiments were carried out strictly in accordance 
with the recommendations of the Guide for the Care and Use of Laboratory 
Animals of the National Institutes of Health. The use of animals in this study 
was approved by the South China Agricultural University Committee of Animal 
Experiments (approval number 201004152). 

Two different animal challenge experiments were conducted. Pigs were used 
without gender preference. In the first experiment, which was conducted before 
the virus was isolated, we used three-day old specific pathogen-free (SPF) piglets 
of the same breeding line, cared for at a SPF facility, fed with colostrum (except 
one). These piglets were bred and reared to be free of PEDV, CSFV, SIV, PCV2 
and PPV infections, and were routinely tested for viral infections using PCR. We 
also conducted NGS to further confirm that these were animals were free of infec- 
tion of the above viruses before the animal experiment, and to demonstrate that 
the animals were free of SADS-CoV infection. The intestinal tissue samples from 
healthy and diseased animals (intestinal samples excised from euthanized piglets, 
then ground to make slurry for the inoculum and NGS was performed to confirm 
no other pig pathogens were found in the samples), were used to feed two groups 
of 5 (control) and 7 (infection) animals, respectively. For the second experiment, 
isolated SADS-CoV was used to infect healthy piglets from a farm in Guangdong, 
which had been free of diarrheal disease for a number of weeks. These piglets were 


from the same breed as those on SADS-affected farms, to eliminate potential host 
factor differences and to more accurately reproduce the conditions that occurred 
during the outbreak in the region. Both groups of piglets were cared for at a known 
pig disease-free facility. Again, qPCR and NGS were used to make sure that there 
was no other known swine diarrhoea virus present in the virus inoculum or any 
of the experimental animals. Two groups (6 for each group) of three-day old pig- 
lets were inoculated with SADS-CoV culture supernatant or normal cell culture 
medium as control. NGS and qPCR were used to confirm that there were no other 
known swine pathogens in the inoculum. 

For both experiments, animals were recorded daily for signs of diseases, such 

as diarrhoea, weight loss and death. Faecal swabs were collected daily from all 
animals and screened for known swine diarrhoea viruses by qPCR. Weight loss was 
calculated as the percentage weight loss compared the original weight at day 0 with 
a threshold of >5%. It is important to point out that piglets when they are three 
days old tend to suffer from diarrhoea and weight loss when they are taken away 
from sows and the natural breast-feeding environment even without infection. At 
experimental endpoints, piglets were humanely euthanized and necropsies per- 
formed. Pictures were taken to record gross pathological changes to the intestines. 
Ileal, jejunal and duodenal tissues were taken from selected animals and stored at 
-80 °C for further analysis. 
Haematoxylin and eosin and immunohistochemistry analysis. Frozen (—80 °C) 
small intestinal tissues including duodenum, jejunum and ileum taken from the 
experimentally infected pigs were pre-frozen at -20 °C for 10 min. Tissues were 
then embedded in optimal cutting temperature (OCT) compound and cut into 
8-\m sections using the Cryotome FSE machine (Thermo Fisher Scientific). 
Mounted microscope slides were fixed with paraformaldehyde and stained with 
haematoxylin and eosin for histopathological examination. 

For immunohistochemistry analysis, a rabbit antibody raised against the SADSr- 

CoV 3755 N protein was used for specific staining of SADS-CoV antigen. Slides 
were blocked by incubating with 10% goat serum (Beyotime) at 37 °C for 30 min, 
followed by overnight incubation at 4 °C with the rabbit anti-3755 N protein serum 
(1:1,000) and mouse anti-cytokeratin 8+18+19 monoclonal antibody (Abcam), 
diluted 1:100 in PBST buffer containing 5% goat serum. After washing, slides were 
then incubated for 50 min at room temperature with Cy3-conjugated goat-anti- 
rabbit IgG (Proteintech) and FITC-conjugated goat-anti-mouse IgG (Proteintech), 
diluted 1:100 in PBST buffer containing 5% goat serum. Slides were stained with 
DAPI (Beyotime) and observed under a fluorescence microscope (Nikon). 
Reporting Summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 
Data availability. Sequence data that support the findings of this study have been 
deposited in GenBank with accession codes MF094681-MF094688, MF769416- 
MF769444, MF094697-MF094701, MF769406-MF769415 and MG557844. Raw 
sequencing data that support the findings of this study have been deposited in the 
Sequence Read Achieve (SRA) with accession codes SRR5991648, SRR5991649, 
SRR5991650, SRR5991651, SRR5991652, SRR5991654, SRR5991655, SRR5991656, 
SRR5991657, SRR5991658 and SRR5995595. 


27. Harlow, E. & Lane, D. Antibodies: A Laboratory Manual (Cold Spring Harbor 
Laboratory Press, New York, 1988). 

28. Ren, W. et al. Difference in receptor usage between severe acute respiratory 
syndrome (SARS) coronavirus and SARS-like coronavirus of bat origin. J. Virol. 
82, 1899-1907 (2008). 
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SADS-CoV (sample 162140) originated in Conghua. The red flag marks 
Foshan city, the site of the SARS index case. b, Pooled intestinal samples 
(n = 5 or more biological independent samples) were collected at dates 
given on the x axis from deceased piglets and analysed by qPCR. The viral 


load for each piglet is shown as copy number per milligram of intestine 


Extended Data Fig. 1 | Map of outbreak locations and sampling sites 
in Guangdong province, China and the co-circulation of PEDV and 
SADS-CoV during the initial outbreak on farm A. a, SADS-affected 
farms are labelled (farms A-D) with blue swine silhouettes following the 
temporal sequence of the outbreaks. Bat sampling sites are indicated with 

black bat silhouettes. The bat SADSr-CoV that is most closely related to tissue (y axis). 
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Extended Data Fig. 2 | Bayesian phylogenetic tree of the full-length split frequencies under 0.01. The host of each sequence is represented as 
genome and the ORFla and ORFIb sequences of SADS-CoV and related a silhouette. Newly sequenced SADS-CoVs are highlighted in red, bat 
coronaviruses. a, Bayesian phylogenetic tree of the full-length genome. SADSr-CoVs are shown in blue and previously published sequences are 
b, Bayesian phylogenetic tree of the ORFla and ORF1b sequences. Trees shown in black. Scale bars, nucleotide substitutions per site. 


were constructed using MrBayes with the average standard deviation of 
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Extended Data Fig. 3 | Phylogeny and haplotype network analyses of 
the 33 SADS-CoV strains from the four farms. a, Phylogenetic tree 
constructed using MrBayes. The GTR+GAMMA model was applied 

and 20 million steps were run, with the first 10% removed as burn in. 
Viruses from different farms are labelled with different colours. Scale bar, 
nucleotide substitutions per site. b, Median-joining haplotype network 
constructed using ProART. In this analysis, ¢ = 0 was used. The size of the 
circles represents the number of samples. The larger the circle, the more 


samples it includes. 
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Extended Data Fig. 4 | Recombination analysis for SADS-CoV and 


related CoVs. The potential genetic recombination events were detected sources of the genomes. 
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Extended Data Fig. 5 | Isolation and antigenic characterization of rabbit serum raised against the recombinant SADSr-CoV N protein (red) 
SADS-CoV. a, b, Vero cells are shown 20 h after infection with mock (a) and DAPI (blue). The experiment was conducted independently three 


or SADS-CoV (b). ¢, d, Mock or SADS-CoV-infected samples stained with times with similar results. Scale bars, 100 jum. 
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Extended Data Table 1 | List of all known swine viruses tested by PCR at the beginning of the of SADS outbreak investigation on the four 
farms 


PEDV PDCoV TGEV RV PBV PSV SVA_ SIV NADC30 PRV FMDV CSFV PCV2 PCV3 APPV PPV_ Norovirus 


Farm A = < 7 7 - - : = = = 7 : ° . . ND a 
Farm B S - - = = < - 7 . = . - . : ND . 
Farm C : 7 - : - : ° : : : = 7 . Ss ° ND 
Farm D - - 7 . * . : - - . s - . . F = ND 


Faeces, intestine or faecal swabs collected from January to April 2017 were tested. Sampling type and number of samples per farm were as follows. Farm A: 1 fecal sample, 20 intestinal sample and 

6 faecal swabs; farm B: 1 faecal sample and 15 intestinal samples; farm C: 2 intestinal sample and 1 faecal swab; farm D: 5 faecal sample and 1 faecal swab. The dash indicates a negative PCR result. 
ND, not determined. APPV, atypical porcine pestivirus; CSFV, classical swine fever virus; FMDV, foot and mouth disease virus; NADC30, porcine reproductive and respiratory syndrome virus, strain 
NADC30; PBY, porcine picobirnavirus; PCV2, porcine circovirus 2; PCV3, porcine circovirus 3; PDCoV, porcine deltacoronavirus; PEDV, porcine epidemic diarrhoea virus; PPV, porcine parvovirus; 

PRV, porcine pseudorabies virus; PSV, porcine sapelovirus; RV, porcine rotavirus; SIV, swine influenza virus; SVA, porcine senecavirus A; TGEV, porcine transmissible gastroenteritis virus. 
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Extended Data Table 2 | List of SADSr-CoVs detected in bats in Guangdong, China 


Sampling 
Time (Month-Year) Location 


Jun 13 


Jul 13 


Jul 13; May 14; 
Jun 15; Aug 16 


Sep 14; Jun 15; 
Aug 16 


Jun 13; Nov 13; 
Aug 14; Jun 15 


Jun 15 
Sep 14 


Yingde 


Yangshan 


Ruyuan 


Conghua 


Huidong 


Baoan 


Xiangzhou 


Bat Species 
Rhinolophus sinicus 
Pipistrellus abramus 
Myotis ricketti 
Pipistrellus abramus 
Hipposideros pratti 
Rhinolophus sinicus 
Rhinolophus affinis 
Rhinolophus macrotis 
Rhinolophus pusillus 
Rhinolophus rex 
Hipposideros pratti 
Rhinolophus sinicus 
Rhinolophus affinis 
Rhinolophus pusillus 
Hipposideros pomona 
Myotis ricketti 
Rhinolophus sinicus 
Rhinolophus affinis 
Rhinolophus macrotis 
Rhinolophus pusillus 
Hipposideros pomona 
Myotis ricketti 
Rhinolophus sinicus 
Rhinolophus pusillus 
Hipposideros pomona 

Total 


See Extended Data Fig. 1 for sampling sites in relation to SARS and SADS outbreak locations. 


PCR analysis 
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Fecal swabs sampled PCR Positive 


1 


591 
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Extended Data Table 3 | Test of SADS-CoV entry and infection in Hela cells expressing known coronavirus receptors 
HuAPN* HuACE2* __ HuDPP4*__ SwAPN* SwACE2*___ SwDPP4* 


SADS-CoV - : - : - - 
SARS-related-CoV NA + NA NA + NA 
MERS-CoV" NA NA + NA NA NA 
Expression* +(S-tag) _+(HIS-tag) +(S-tag) +(S-tag) _+(S-tag) _+ (S-tag) 


*Gene accession numbers for the genes used in this study: human APN, M22324.1; human ACE2, NV_021804; human DPP4, NM_001935.3; SwAPN (swine APN), NM_214277.1; SwACE2 (swine 
ACE2), NM_001116542.1; SwDPP4 (swine DPP4), NM_214257.1. 

+For MERS-CoV infection, HIV-pseudovirus was used. 

4Expression of APN, ACE2 and DPP4 was confirmed by antibodies against fused tags. 
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Extended Data Table 4 | Experimental infection of SPF piglets using intestine tissue homogenate 


a 
“Group. Animal Age—~~SInoculum.~=~=~—~SADS-CoV_ ‘Inoculum Inoculation | _______ Data recorded on day one and (day two) postchallenge SS 
Number — (days) material titer volume route Death Weight Watery SADS-CoV PEDV/PDCoV/RV 
(copy/ml) loss diarrhea ‘+ve’ +e) 
Infected 7 3 PCR positive intestine slurry 1.55x10°6 4ml Oral + milk 0/7 (3/7) 4/7 (5/7) 5/7 (7/7) 6/7 (7/7) o/7 (0/7) 
Control 5 3 PCR negative intestine slurry 0 4ml Oral + milk 0/5 (1/5) 1/5 (3/5) 0/5 (1/5) 0/5 (0/5) 0/5 (0/5) 
b 
Group ae Piglet-I1* Piglet-I2* Piglet-3* Piglet-14* Piglet-5" Piglet-I6" —_Piglet-17* 
0 0.565 0.66 0.6 0.68 0.49 0.57 0.62 
Infected 1 0.555 0.635 0.685 0.715 0.4 0.475 0.565 
2 0.51 0.52 0.665 0.785 
Piglet-C1* Piglet-C2* _—Piglet-C3* _—Piglet-c4"* Piglet-C5* 
0 0.67 0.59 0.5 0.53 0.525 
Control 1 0.765 0.53 0.49 0.51 0.535 
2 0.765 0.53 0.575 0.505 


Experimental details can be found in the Methods. a, Animals were recorded every day for signs of disease, including weight loss, diarrhoea and death. PCR on DNA from faecal swabs was carried out 
to monitor the presence of SADS-CoV or other pig viruses. b, Daily body weight record of all piglets. Weights are in kg. 

*Euthanized on the indicated day for further analysis. 

tAnimal died during the experiment. 

+The only animal that did not receive colostrum in this experiment due to shortage in supply. 
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Extended Data Table 5 | Experimental animal infection of farm piglets using cultured SADS-CoV 


a 
Group Animal Age Inoculum SADS-CoV Inoculum Inoculation Data recorded on day two and (day four) post challenge 
Number = (days) material titer volume route Death Weight Watery SADS-CoV PEDV/PDCoV/RV 
(TCID;./ml) loss diarrhea ‘+ve ‘+ve 
Infected 6 3 Cultured SADS-CoV 10*6.625 6 ml Oral + milk 1/6 (3/6) 4/6 (6/6) 6/6 (6/6) 6/6 (6/6) 0/6 (0/6) 
Control 6 3 Mock culture supernatant 0 6 ml Oral + milk 0/6 (0/6) 3/6 (3/6) 5/6 (3/6) 0/6 (0/6) 0/6 (0/6) 
b 
Group Days post challenge Piglet-11" Piglet-12" Piglet-13* Piglet-l4* Piglet-I5* Piglet-16" 
0 1.5 1.54 2.32 1.92 1.54 2.165 
ll 1.41 1.575 2.58 1.885 1.46 2.08 
Infected 2 1.23 1.39 2.615 1.73 1.54 1.365 
3 2.115 1.54 1.335 1.725 
4 1.505 
Piglet-C1* _ Piglet-C2* Piglet-C3* _Piglet-C4* _—Piglet-C5*_—Piglet-C6* 
0 1.955 2.055 2.8 1.835 1.835 1.83 
1 1.765 1.955 1.9 1.68 1.645 1.93 
Control 2 2.12 1.675 1.93 1.515 1.9 
3 2.25 1.69 2.18 1.66 2.38 
4 2.27 1.555 2.58 


Experimental details can be found in the Methods. a, Animals were recorded every day for signs of disease, including weight loss, diarrhoea and death. PCR on DNA from faecal swabs was carried out 
to monitor the presence of SADS-CoV or other pig viruses. b, Daily body weight record of all piglets. Weights are in kg. 

*Euthanized on the indicated day for further analysis. 

tAnimal died during the experiment. 
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MOUSE MODELS 
WITH A HUMAN TOUCH 


Engineered mice are valuable for disease and drug research, but scientists hunger for 
cancer models that better mirror the condition in humans. 


+ 


Mice are commonly used to study cancer, but scientists are still working to improve modelling of the human disease. 


BY MIKE MAY 


pathologist Katsusaburo Yamagiwa and 

his assistant Koichi Ichikawa were focused 
ona killer nearly as deadly as the battle raging 
on the Western Front. The duo, based at what 
was then the Imperial University of Tokyo, had 
spent more than 150 days painting coal tar on 
the ears of rabbits. Finally, they found that the 
rabbits had cancer. 

Yamagiwa’s diseased rabbits are considered 
to have been the first animal model for cancer 
research’. Since then, scientists have used every- 
thing from cell lines to engineered mice to try 
to mimic human cancer. But finding the option 
best suited to answering a specific experimental 
question requires a lot of thought. 

According to medical oncologist David 
Weinstock of the Dana-Farber Cancer Insti- 
tute in Boston, Massachusetts, what makes 


E 1915, with the world at war, Japanese 


for a good cancer model is “a very complex 
question, and the simplest answer is it must 
be able to give me insight — truly answer the 
question that I want to ask. If it can’t do that, 
I'm wasting my time.” 

For Nancy Boudreau, a branch chief at the US 
National Cancer Institute (NCI) in Bethesda, 
Maryland, a model’s fidelity to the course of 
human cancer is key. “The more it recapitulates 
the human disease and progression, the better,” 
she says. An ideal cancer model should replicate 
many of the features that occur in human can- 
cer, including how it develops and progresses 
when facing a human immune system; how it 
metastasizes, or spreads from its primary source 
to other parts of the body; and how it reacts to 
therapy. That requires scientists to know the 
pros and cons of each cancer model, because 
none will answer every research question. 

Some evidence suggests that, despite many 
options, no existing model of cancer is good 


enough for developing therapeutics. According 
to a report co-authored by the international Bio- 
technology Innovation Organization that exam- 
ined clinical trials from 2006 to 2015, cancer 
drugs fared the worst out of 15 disease group- 
ings, progressing from phase I to approval only 
5.1% of the time (see go.nature.com/2pxfn16). 
By contrast, success rates for haematology and 
infectious-disease therapeutics were 26.1% and 
19.1% , respectively. 

“If better preclinical models could improve 
the clinical translatability by just 10%, that 
would very much improve the quality of pre- 
clinical cancer research and translate into 
enormous savings for drug developers,” says 
Hellmut Augustin, a specialist in vascular 
oncology at the German Cancer Research 
Center in Heidelberg. 

Groups collaborate worldwide to improve 
these models. Scientists at the NCI, Cancer 
Research UK in London, the Wellcome Trust 
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» Sanger Institute in Hinxton, UK, and the 
not-for-profit Hubrecht Organoid Technology 
in Utrecht, the Netherlands, for instance, have 
teamed up on an effort called the Human Can- 
cer Models Initiative. It launched in 2016 with 
the goal of developing 1,000 new cancer models 
in cell lines for use by researchers around the 
world. Such projects suggest that many scien- 
tists agree on the value of expanding the pool 
of models. 


MODIFIED MOUSE GENOMES 

For many questions, the humble cultured cell 
provides sufficient insight. But these cells are 
typically grown in unnatural 2D formats that 
lack the conditions in which human cancers 
grow — especially, an immune system. This 
makes cultured cells ill-suited for modelling 
many aspects of disease. Instead, says cancer- 
systems biologist Shannon Hughes of the NCI, 
a good starting point for many investigations is 
a genetically engineered mouse (GEM). “They 
are well characterized and well controlled,” 
she says. 

For years, engineering a mouse required 
complicated processes to generate desired DNA, 
transform cells in culture and inject them into 
an embryo to modify its genes. But the options 
for making a GEM today, like most other 
genetic-modification applications, changed 
with the discovery of the CRISPR gene-editing 
system. “CRISPR has enabled more-subtle 
manipulations that were extremely challenging 
with previous technologies,” says cancer biolo- 
gist Lukas Dow of the Weill Cornell Medical 
College in New York City. 

“For instance,’ Dow says, “it is now relatively 
straightforward to induce large chromosome 
rearrangements — inversions, deletions and 
translocations” — associated with disease. With 
CRISPR, scientists can even change a single 
base in a rodent’s DNA. Base-by-base resolu- 
tion offers “the ability to accurately recreate the 
precise mutations observed in human cancer’, 
he notes. “Such detail has been largely ignored 
in model development thus far, but it is increas- 
ingly apparent that the devil is in the detail” 

Taeyoung Koo, a genome engineer at South 
Korea’s Institute for Basic Science, based in 
Daejeon, and her colleagues used CRISPR to 
target a mutation in non-small-cell lung cancer 
(NSCLC). They report that of human cases of 
NSCLC, 15% involve a change to just one DNA 
base — known as a single-nucleotide muta- 
tion — in the epidermal growth-factor recep- 
tor (EGFR) gene. Current treatment consists of 
drugs, such as gefitinib, that target the mutated 
protein produced by that gene. 

Koo’s team developed a CRISPR-Cas9 guide- 
RNA sequence that recognizes the most com- 
monly mutated EGFR region, which accounts 
for more than 40% of EGFR-mutation-related 
NSCLC cases. They then implanted mice with 
human NSCLC tumours and targeted the muta- 
tion with CRISPR-Cas9 and a specific guide 
RNA. Their results showed that a properly 
designed guide RNA is sufficiently precise to 
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break the diseased sequence, yielding a poten- 
tial therapeutic strategy. The “mutant allele-spe- 
cific Cas9 can efficiently distinguish the EGFR 
mutant allele from the wild-type allele, leading 
to targeted oncogene disruption and cancer cell 
death’, they report’. 

Although CRISPR shows remarkable tar- 
get specificity, the result of its activity can be 
highly variable. So, if the goal is to create con- 
sistent and uniform genetic changes across all 
cases, nucleases such as Cas9 are a poor choice, 
says Dow. “The random nature of DNA repair 
in traditional CRISPR systems means that 
you have to deal with a significant amount of 
heterogeneity in cell populations.” 

GEMs have limitations, too, especially con- 
cerning the timing and heterogeneity of dis- 
ease. “Mouse tumours progress incredibly fast,” 
Hughes explains. That speed enables research- 
ers to accelerate their experiments, but they fail 
to replicate the pace of disease in humans. Plus, 
she says, the tumours tend to be too homogene- 
ous to reflect human disease properly: a GEM 
usually includes just one or two genetic changes, 
whereas human tumours often have many. 

To address the lack ofheterogeneity and pro- 
duce a more human-like model, biomedical 
scientist Lorenzo Federico and his colleagues, 
working in the laboratory of systems biologist 
Gordon Mills at the University of Texas MD 


Organoid options 


Mice aren’t the only options researchers 
have for modelling cancer. A popular 
emerging alternative is the organoid —a 3D 
cell culture that mimics some of the micro- 
anatomy of an organ, such as its system of 
blood vessels. 

“A tumour is a kind of organ, where 
tissues cooperate,” says molecular 
biologist Claudine Kieda of the Centre for 
Molecular Biophysics in Orleans, France. 

“A 3D cell model takes into account the 
microenvironment, such as the level of 
oxygen around and in the tumour.” 

Kieda’s lab combines melanoma and 
endothelial cells in a matrix composed of 
collagen, growth factors and a 3D scaffold 
called Matrigel. This mixture allows the cells 
to form a structure that resembles a tumour 
and its surroundings, especially in terms 
of oxygenation’. “Everyone is working in 
conditions that are like an incubator, where 
the partial pressure of oxygen is much 
higher than in the body,’ Kieda says. 

Among other uses, organoids are 
valuable for drug development. For instance, 
Meritxell Huch, a tissue-repair biologist at 
the University of Cambridge, UK, and her 
colleagues created liver-cancer organoids for 
drug screening®. This type of tumour can be 
grown in mice only about 20% of the time, 


Anderson Cancer Center in Houston, engi- 
neered a collection of transplantable grafts 
from primary breast tumors in transgenic 
mice’. The procedure yielded 12 new graft lines 
of mice — mouse models that can reliably pro- 
duce specific types of cancer with a wide array 
of genetic changes. “Ideally, different primary 
tumours arising in different mice should be 
characterized by different molecular alterations 
to more closely reflect the genetics of human 
cancer,’ Federico says. 

These models have already been used 
successfully as preclinical platforms for the 
assessment of targeted therapeutics, including 
inhibitors of molecular pathways involved in 
cancer. According to Federico, they are also well 
suited for studying the role of the immune sys- 
tem in tumorigenesis and therapeutics develop- 
ment. Yet, because these transplantable grafts 
were derived from engineered mouse tumours, 
he says, the results recorded from this approach 
“must be always taken with a grain of salt”. 

How cancer arises and progresses depends 
intimately on its interaction with the host 
immune system. Some of the most promising 
treatments, called immunotherapies, engineer 
a patient’s immune system to attack a specific 
tumour. To study these therapies, scientists need 
mice with an intact immune system — better 
yet, a human one. That led to humanized mice. 


Organoids are an increasingly popular model. 


but Huch achieved a success rate of nearly 
80% — and the process worked about twice 
as quickly as with a patient-derived xenograft. 
“Speed is the main advantage,” Huch 

says. Using these organoids, Huch’s team 
identified an inhibitor of a signalling pathway 
that represents a potential target for treating 
primary liver cancer. 

As with other cancer models, a good 
organoid replicates the human disease as 
faithfully as possible. Organoids, says Nancy 
Boudreau, a metastasis researcher at the 
US National Cancer Institute in Bethesda, 
Maryland, “are more biological than cells in 
regular culture”. And they are less expensive 
than mice. Wi.Vl. 
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For these GEMs, human hematopoietic stem 
cells — precursors to an array of blood cell types 
— are implanted into an immune-deficient 
mouse. This process recreates certain aspects 
of the human immune system, such as white 
blood cells called T cells, which attack foreign 
cells. Then, a sample of a human tumour — 
called a patient-derived xenograft (PDX) — can 
be transplanted into the mouse, creating a more 
realistic model of human disease. 

According to Augustin, PDX models are 
increasingly popular among drug developers, 
who use them as test beds for drug testing. PDX 
models are also moving into basic-research labs, 
and are commercially available. Working with 
more than 20 cancer clinics, the Jackson Labo- 
ratory in Bar Harbor, Maine, has created more 
than 450 of these mouse models, including ones 
for acute myeloid leukaemia and bladder, breast, 
lung, ovarian and pancreatic cancer. They usu- 
ally cost about three times as much as standard 
immune- deficient mice, the non-profit says. 

Scientists can also develop their own PDX 
mice. Oncologist Elizabeth Stewart of St. Jude 
Children’s Research Hospital in Memphis, 
Tennessee, and her colleagues used sam- 
ples of surgically removed paediatric solid 
tumours, representing brain, bone and other 
cancers, to generate 67 PDX mouse models 
covering a dozen tumour types’. 

Stewart and her colleagues’ aimed to create 
models for studying treatment efficacy against 
different tumour types — an approach that 
requires the model to represent the original 
disease accurately. Stewart's team decided to 
compare the PDX and source tumours at the 
nucleic-acid level using whole-genome and 
whole-exome DNA sequencing. Overall, they 
found, the PDX sequences largely matched 
the genomic features of the source tumours, 
although new mutations also emerged. The 
PDXs “retained the molecular and cellular fea- 
tures of the patient tumour and the epigenetic 
landscape of their developmental origins’, the 
researchers concluded. 

That's not to say that PDXs are static. Todd 
Golub, director of the cancer programme at the 
Broad Institute of Harvard and MIT in Cam- 
bridge, Massachusetts, and his team studied 
genomic rearrangements called copy-number 
variations (CNVs) in 543 PDX models repre- 
senting 24 classes of cancer’. They found that 
expansive regions of CNVs constituting more 
than 5 million bases had been introduced into 
60% of the PDXs after 1 passage from the origi- 
nal mouse to its offspring, and 88% of PDXs 
after 4 passages. The results show that PDXs 
that initially mimic human disease can evolve 
into forms that do not. When that happens, the 
PDX loses its faithfulness to the target cancer. 

Boudreau describes engrafting human PDX- 
model tissues into humanized mice as one of 
the most intriguing new cancer models to 
emerge, but says it’s “not quite there yet” because 
researchers have yet to learn the ins and outs of 
the technology. That said, she adds, the tech- 
nology could prove useful for one hot facet of 


therapeutics development: “Humanized mice 
will be pretty critical with the interest in immu- 
notherapy and how human tumours respond,” 
she says. 


BACK TO THE BEGINNING 

Rather than relying on genetic techniques to 
produce a better model of cancer, some scien- 
tists are going old school — using Yamagiwa’s 
approach. This chemical-carcinogenesis 
method uses ordinary lab mice, and the results 
can create more-realistic cancer models. 

“You treat a mouse with a carcinogen, like an 
environmental agent, to cause a specific type of 
damage and to get specific types of tumours, 
such as tumors in 
the skin,” explains 
tumour biologist 
Melissa Reeves at the 
University of Califor- 
nia, San Francisco. 

“This does a good 

job of recapitulating 

tumours in humans 

exposed to specific environments, because 
it models the natural evolution of a tumour 
caused by a wide array of genetic damage.” 

Chemical carcinogens can damage DNA at 
hundreds of sites, and their impact can be fol- 
lowed over time. Reeves and her colleagues 
took this approach, using topical applications 
of known carcinogens called DMBA and TPA 
to induce skin cancer in mice, to study how 
tumours move from a primary site to a second- 
ary one’. Her findings suggest that skin cancer 
does not travel serially from site to site — from 
skin to lymph nodes to lungs, for instance — but 
rather, “by parallel dissemination, going to the 
lymph nodes and lungs at the same time”. 

This finding, Reeves says, provides experi- 
mental validation of a well-documented clinical 
finding: that removing the lymph nodes around 


CANCER MODELLING 


breast cancer doesn't always increase survival, 
an observation that led researchers to speculate 
about the possibility of parallel transmission. 

Although chemical carcinogenesis cre- 
ates diseases that, compared with GEMs and 
humanized mice, might better resemble the 
heterogeneity of human cancer, actually using 
these models has significant downsides. It can 
take 18 months to create a primary tumour 
through chemical means, remove it and study 
the course of metastasis. “Plus, every tumour 
is going to be alittle bit different,” Reeves says. 

Likewise, every cancer model differs, and 
mice aren't always the best choice (see ‘Orga- 
noid options’). Mice are expensive to maintain 
and pose ethical concerns, which will always 
make cell lines an option to consider. 

For now, researchers must choose a 
model — despite its shortcomings — that they 
think will best answer their specific question. 
At the same time, scientists will keep advanc- 
ing existing models and developing new ones. 
As Augustin notes, “It is well-invested money 
to develop and employ mouse tumour models 
with better translational relevance and impact.” 
Otherwise, the performance of cancer drugs in 
clinical trials might never improve. m 


Mike May is a freelance writer based near 
Houston, Texas. 
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RESEARCH ENTERPRISE 


The rise of 
outsourcing 


Big pharma is downsizing, and contract research 
organizations are reaping the benefits. 


BY ESTHER LANDHUIS 


lokta Chakrabarti manages drug- 
As" identification and discovery 
for client pharmaceutical companies. As 
a project team leader at the contract research 


organization (CRO) ProQinase in Freiburg, 
Germany, she spends her days meeting clients, 


working at the bench, flying through data 
analysis — and chasing a lot of deadlines. 

A few decades ago, drug makers did their 
own discovery work, along with every other 
element of getting a drug or medical device to 
the marketplace. But today, nearly anything that 
a pharmaceutical, biotechnology or medical- 
device business needs to do — from designing 


assays to planning and running clinical trials — 
can and may be outsourced to CROs. 

These specialized companies fall into several 
categories. Preclinical CROs test drug or device 
candidates for client businesses before the com- 
pounds or devices undergo clinical or human 
testing. This might include helping a client to 
synthesize compounds, run biochemical assays 
or conduct animal studies. Clinical CROs focus 
on clinical-trial services, such as medical writ- 
ing, data analysis, managing regulatory-affairs 
processes and other functions associated with 
getting new drugs or devices to market. A 
growing number of speciality CROs focus ona 
particular stage of clinical development, or offer 
services within a specific therapeutic niche. 

The CRO industry is benefiting from recent 
downsizing trends in big pharma, as well as 
from a related proliferation of smaller drug 
makers, says Ken Getz, who studies research 
and development management practices at 
Tufts University in Boston, Massachusetts. The 
number of drugs entering clinical trials contin- 
ues to rise, and companies that are slashing their 
workforce look to CROs to help them manage 
their portfolios. The need for outsourcing is 
even greater for smaller biopharmaceutical 
firms with lean headcounts and scant clinical 
experience, says Getz. 

The global clinical CRO market topped 
US$23 billion in sales in 2014, and is predicted 
to exceed $35 billion in sales by 2020. More than 
one-third of all global drug-discovery research 
will be farmed out to CROs by 2021, predicts 
Kalorama Information, a market-research 
publisher in Rockville, Maryland, in its 2018 
Outsourcing in Drug Discovery report (see 
go.nature.com/2jgjoqb). 

As the CRO industry takes on larger and 
more-complex roles, the distinction between 
working in biopharma and at a CRO is blurring. 
Jobs in both areas are listed with life-sciences 
recruitment agencies and on job boards. Many 
people thought that few CROs could draw top 
researchers, says Josh Schultz, senior vice-pres- 
ident of Parexel, a full-service CRO headquar- 
tered near Boston, with a workforce of about 
19,000 across more than 50 countries. Now, 
he says, those researchers are more evenly dis- 
tributed. “CROs have become development 
partners in ways that we werent before now,” 
he says. “Client companies say, ‘Can you take 
this compound from start to finish?” 


JOB BOOM 

About 15 years ago, the top 5 CROs world- 
wide collectively employed around 30,000 
people. Now, that group has nearly 100,000 > 
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> employees, and a single CRO can have 
thousands of clinical trials in progress at any 
given time, estimates Schultz. According to 
the Association of Clinical Research Organi- 
zations (ACRO) in Washington DC, whose 
members run trials in 142 countries, more 
than half of CRO jobs are in the United States 
and Europe. India has 8% and the United King- 
dom 7%, finds ACRO’s 2015 member survey. 

Researchers who have solid project- 
management and communication skills will 
be competitive for jobs at CROs — and could 
be even more strongly positioned if they have 
experience working with large data sets, say 
industry experts. 

Chakrabarti joined ProQinase in 2016 after 
completing a postdoc in histone epigenetics 
at the University of Freiburg; before that, she 
had done undergraduate work in India and a 
cancer-biology PhD in the Netherlands. So it 
was a challenge to shift her mindset into a busi- 
ness-oriented, deadline-driven approach, she 
admits (see go.nature.com/2j0xipn). “You do 
your experiment, and if it does not work, you 
try another,’ she says of her PhD programme 
and postdoc. “There is no deadline” 

Her CRO clients, however, often want results 
in one or two weeks. “You have to plan very 
wisely,” Chakrabarti says. “If there’s a prob- 
lem, you have to troubleshoot. If you have too 
many projects, you end up with overlapping 
deadlines. It’s a little stressful” 

Many academic labs don’t emphasize these 
kinds of project-management skills. “Particu- 
larly in PhD programmes, there’s this culture 
saying you should learn how to do everything 
yourself? says Elizabeth Iorns, chief executive 
of Science Exchange in Palo Alto, California, 
a network of CROs, core labs and other scien- 
tific-service providers that runs experiments 
for a fee. But in industry, she says, you need to 
highlight and burnish specific skills and talents. 
“Tt's impossible to be trained in every technique,” 
says orns. 

Leaving academia requires a mental shift, 
Iorns says — less focus on creativity, invention 
and first-author papers, and more emphasis 
on discipline and quality control. Metrics for 
success also differ from those in an academic 
lab. “If your end goal is to bring a drug to mar- 
ket, you want to know as soon as possible that 
it’s not going to work,’ she says. “Negative data 
are just as valuable as positive data.” 

CROs can offer excellent opportunities for 
those pursuing an undergraduate degree, say 
some, and they can provide a crash course in 
lab techniques and interpersonal skills. That 
was the case for Nikita Patel, who started a 
job at a local preclinical CRO after graduating 
from the University of San Diego in California. 
“CROs are often fast-paced, cut-throat envi- 
ronments that teach you a lot in a short time,” 
Patel says. “Working there taught me to perse- 
vere and showed me the nitty-gritty nuances 
of doing science.” 

CROs can also bea good option for research- 
ers who love science but hate benchwork. A 
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few months into his master’s degree in biology 
at Sonoma State University in California, Brian 
Wenzel realized that he wasn’t suited for aca- 
demia. “Lab work for me was really daunting,” 
he says. “If you screw up halfway through, you 
could compromise two years’ worth of data. 
It requires a certain kind of personality to be 
excited and tenacious enough to keep doing 
the same thing over 
and over with the 


“ 

precision needed for CROs are often 
top-tier research” fast-paced, 

So after graduat- cut-throat 
ing, in 2008, Wenzel @!Vironments 
started as a customer- that teach you 
service representative lotin a short 
at acontract manu- time.” 


facturing organiza- 

tion that specialized in protein research, and 
moved on to other CRO sales and business 
positions. Patel, too, pivoted to business devel- 
opment at her second CRO once her superiors 
saw that she could talk to people easily. She 
now works remotely from San Diego at Science 
Exchange, asa director of supplier relations. 


A SHIFT IN FOCUS 

Historically, CROs created positions that 
mirrored the services needed by the pharma- 
ceutical sector. Ifa drug maker needed people 
to write journal manuscripts, for example, the 
CRO would supply medical writers — or pro- 
ject managers, or clinical-trial managers, or 
whatever a potential or existing client company 
might have required. 

Now, however, large CROs are aiming to 
get ahead of the curve by providing data- 
management and data-analysis services, 
Getz says (see go.nature.com/2pnx2y5 and 
go.nature.com/2gg7sv9). Indeed, the land- 
scape looks good for those who are skilled in 
these areas. CROs and biopharma plan to hire 


25% more internal staff worldwide between 
now and 2020 for collecting, storing and mak- 
ing sense of the boatloads of data lurking in 
electronic health records (EHRs), social media 
and digital devices, according to a 2017 survey 
conducted by Tufts University’s Center for the 
Study of Drug Development, where Getz is 
based. 

Access to more-nuanced patient data is also 
enabling clinical-trial designs of greater com- 
plexity. As trial sponsors shift towards schemes 
that require different statistical and data-capture 
methods, researchers with those skills will be 
attractive to CROs, predicts Michael Winlo, 
chief executive of Linear Clinical Research, a 
mid-sized CRO in Nedlands, Western Australia, 
that specializes in early-stage clinical trials. 

“Can you think about ways of accelerating 
a trial? If we drop from five to four patients, 
do we still get the statistical power we need?” 
Winlo says. “There's a lot of mental brainpower 
in trial design — in the statistics, in the writing, 
in making sense of the data.” 

Today, Parexel and other large CROs are 
hiring more data scientists — informaticians, 
epidemiologists and other people who can 
work with large data sets and extract insights 
from them. Candidates who can glean key 
information from insurance claims, EHRs and 
other real-world data will be in great demand, 
Schultz says. “Ten years ago, we had 2 people 
who could do this. Now we need maybe 100,” 
he says. “It’s become a skill set we actively seek” 

Last year, Paraxel bought Anolinx, a small, 
speciality CRO in Salt Lake City, Utah. 
Normally, Schultz adds, the target CRO would 
have been too small to acquire, but its data sci- 
entists had exactly the skills that Parexel sought. 

Another example of innovation in trial 
outsourcing is Science 37, a company in Los 
Angeles, California, that functions as both 
a research ‘site’ and CRO, says co-founder 
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Belinda Tan. The company conducts virtual 
clinical trials through a telemedicine plat- 
form that allows researchers to easily find 
participants, who are able to avoid a trip to 
the clinic and get instructions from study 
staff through video calls at home. The 
platform serves as a data repository for all 
Science 37’s trials, and staff members have 
access to some, depending on their role. 
Tan says that, for her as a physician, the 
repository acts like a clinical-trial EHR for 
participants. 

Trial participants use mobile apps on 
smartphones provided by Science 37 to 
get their daily task list — for example, to 
complete a questionnaire or wait for a nurse 
to visit. To make these virtual trials work, 
Science 37 seeks not only conventional 
CRO candidates who have experience with 
clinical data, but also marketing and media 
specialists, web engineers, product design- 
ers, graphic designers and others. 

Salaries for CRO employees vary widely 
depending on the level of education and job 
responsibilities. Clinical-research associ- 
ates, who typically do not have PhDs, earn 
$50,000-65,000 on average in the United 
States, and clinical-research managers and 
clinical-research directors, who might have 
a doctorate, can earn more than $100,000. 

Because they work with multiple clients, 
CROs tend to offer job stability — if one 
project fails or ends suddenly, the company 
can shift flexibly to another project with a 
different sponsor. And because the work 
is fast-paced and varied, employees can 
often broaden their skill sets and climb the 
career ladder more quickly than they would 
working at a pharmaceutical company. 

Chakrabarti concedes that she misses 
one element of academic research. “You 
can follow a drug there from birth to clinic? 
she says. Conversely, CRO scientists often 
work with many different drug candidates 
at varying stages of development. “You have 
confidentiality agreements with these cli- 
ent companies, so you don't know anything 
about the compound. You do the assay but 
you don't know what happens later,’ she 
says. “Even when a molecule leaves a pow- 
erful impression — like, ‘this is the strongest 
inhibitor I've ever seen’ — your interest ina 
particular project has to stop with a particu- 
lar deadline. This is what I find sad” 

At least once so far, however, a chance 
run-in has brought the process full circle for 
her. At a cancer-therapeutics conference in 
Philadelphia, Pennsylvania, last October, 
Chakrabarti saw one of her clients present- 
ing data about a familiar compound. She 
asked him if it was one that she had screened. 
Indeed, it was, he said, and the compound 
was heading into clinical trials. m 


Esther Landhuis is a freelance science 
journalist in the San Francisco Bay Area, 
California. 
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Convert weaknesses 


into assets 


Work out what you really enjoy doing, and pitch your 
skills accordingly, says Lia Paola Zambetti. 


I am giving you as much advance 
notice as I can so that you can find 
something else.” 

Hearing these words from my supervisor's 
mouth left me reeling. As a native of Italy, and 
as a postdoctoral researcher in a nation outside 
the European Union, I hada visa that depended 
on my having a work contract. Without a job, I 
would have to leave the country shortly after the 
end of my contract. 

Furthermore, the words felt like a death knell 
for my research career. Surely no one would ever 
hire me for a second postdoc when this one 
had failed to yield any research papers. What 
would I do ina few months’ time when my post- 
doc ended? I was literally dizzy — I needed a 
strategy to find another position, and fast. 

That was a tough week, but I am now grate- 
ful for that shocking announcement: it gave me 
clarity and enough time to make a plan. 

The deadline made me think hard about my 
next steps. Somehow, I was able to start spell- 
ing out to myself what I emphatically did not 
want to do any more. It doesn’t sound like the 
most logical step ever — surely, planning what 
you actually want to do makes more sense — 
but it was spectacularly helpful in clarifying my 
thoughts. Soon, I came up with a two-pronged 
strategy: first, look only for a research project 
that perfectly matches my wishes and skills; 
second, explore non-academic options as a real 
possibility — for the first time. 

Because it looked increasingly likely that my 
future career was going to be outside academic 
research, I set out to turn my weaknesses into 
strengths. All the points that my supervisors 
and potential employers had highlighted as 
faults for a researcher — a poor publication 
record; no specific research niche; a tendency 
to ‘waste time’ reading papers from very dif- 
ferent fields; and indulging my passion for 
writing — I aimed to turn into strengths for 
non-lab-based jobs. 

Because I couldn't count on papers to speak 
for my research, I decided to network more. I 
converted my lack ofa speciality into a ‘broad 
and diverse background’ and an ability to 
speak knowledgeably to scientists from differ- 
ent fields. My keen interest in writing, seen by 
some as a time sink, nudged me towards jobs in 
editing and science writing — something I had 


C Cc E afraid I won't renew your contract. 


considered only as a vague dream. 

I was not sure whether a good occupational 
fit for me existed, but I still had a few months 
to find out, so I set up informational chats with 
nearly everyone I could think of. And, for the 
first time ever, I was always straightforward 
about what I was — and was not — looking 
for in my new role. 

One serendipitous talk ona Saturday led to a 
meeting with the director of the institute where 
I was doing my postdoc, which in turn led to 
an informal chat with a senior representative 
from the institute's marketing and corporate 
communications unit. She had been tasked 
with forming a science-communication team 
on an institution-wide level, and wanted to 
recruit a scientist. 

Three months and two interviews later, the 
representative became my boss, and Ihad found 
my perfect fit in a role that focused on science 
communication and editing and that was com- 
pletely away from the bench. As it happened, I 
also received an offer for a postdoctoral research 
project that aligned perfectly with my skills and 
interests. I regretfully felt obliged to decline it. 

In the end, although it took all the time I 
had, I got not one but two great offers. And 
both matched my skills and interests — all 
because I had been clear about what I no longer 
wanted and because I had turned my weak- 
nesses into strengths. m 


Lia Paola Zambetti is a senior project officer at 
the University of Sydney’ Research Portfolio in 
Australia. 
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Ua SCIENCE FICTION 


BY S. R. ALGERNON 


---- OFFICIAL SUMMONS ---- 


Dear Citizen, \ 


Pursuant to the Presidential Leader- 
ship Utilizing Representative Individual 
Brains in Unified Simulation (PLURI- 
BUS) Act, you are hereby given notice 
of your selection to serve as this dis- 
trict’s contribution to the collective 
leadership of the nation’s Executive 
Branch. Your term of service shall 
last for 14 days within the 3 months 
of August to October 2048. 

You, along with 99 other registered 
voters, shall receive redacted briefings 
on domestic and foreign policy through 
a secure neural link. Your cognitive and 
affective responses to world events will 
inform the simulation’s decision-making 
algorithms. Any thoughts, images or sub- 
vocalized speech during the recording 
period may be picked up by the link and 
incorporated into executive actions, com- 
muniqués to foreign powers and addresses 
to the nation. The people and their 
representatives turn to you for guidance 
and leadership. 

If you do not already have neural-inter- 
face hardware installed, please report to 
an authorized implantation centre on or 
before 15 July 2048. Alternatively, you may 
use the attached questionnaire to request 
a postponement or to be excused from 
service if 


1) You have already served two terms with 
PLURIBUS. 

2) Your current residence is five or more 
light seconds from Earth. 

3) You have a medical condition that contra- 
indicates neural-link implantation. 

4) You are under 35 years of age and are not 
cognitively augmented to a mental age of 35 
or greater. 

5) You are over 70 years of age and are not 
taking senescence blockers. 

6) You are currently serving a term with the 
legislative or judicial simulations. 


As you prepare for the start of your term, 
please keep the following in mind: 


@ You will not know which 14 days will 
contribute to the PLURIBUS network. This 
has been shown to produce more natural 
decision-making and to reduce anxiety for 
PLURIBUS contributors. 
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E-PLURIBUS 


Time to play your part. 


@ Expect some cross-talk from other facets 
of PLURIBUS, especially in the first few 
days after implantation. Do not be alarmed. 
Most people quickly learn to differentiate 
their own thoughts from the stray thoughts 
of others. 
@ We request that you refrain from overly 
vigorous sexual activity during your term of 
office. We do our best to filter such things 
out, but we appreciate any effort you can 
make to ensure that PLURIBUS is not dis- 
tracted or disturbed during what might turn 
out to be a crucial moment. 
@ Note that PLURIBUS comprises the neu- 
ral patterns of 100 citizens at any one time. 
Please advise your friends and relatives not 
to hold you personally responsible for the 
nation’s foreign or domestic policy. 
@ The Board of Elections recommends that 
you inform as few people as possible of your 
appointment until after your term expires, so 
that you are not subject to undue influence. 
@ With regard to the last point, please pay 
particular attention to conflicts of interest 
and receipt of gifts from foreign powers. 
@ During your term of office, it is likely that 
you will experience newfound historical 
knowledge, understanding of world events, 
and appreciation of economics and global 
trade. This is a normal effect of the neural 
link and is not a cause for alarm. 
@ Any sudden preference for tricornes, 
stovepipe hats or other eccentric headdress 
is not an effect of the neural link. We have no 
idea why this occurs. 


> NATURE.COM @ On any particular 
Follow Futures: issue, you may find 
Y @NatureFutures that PLURIBUS’s final 


Ei go.naturecom/mtoodm decision differs from 


your own view, or that your perspective was 

not adequately considered. Rest assured 

that PLURIBUS uses the latest algorithms. 
Any reports you may have read about bias 

or hacking are speculative and without 
foundation. 

@ During your term in office, you may 
feel overwhelmed by the pressures of the 
job. Remember that the burden does not 
rest entirely on your shoulders. You 

and your 99 other colleagues will 

depend on each other. It is not like 

the old days, when we entrusted the 
leadership of the executive branch to 
a single, flawed, human being. 
@ At the end of your term in office, you may 
find yourself feeling disconnected, or feel 
that your life no longer has the scope or 
meaning that it once had. Remember that 
life goes on after the presidency. Please con- 
sult the links at the end of this summons 
for advice on writing a presidential mem- 
oir and for links to several e-book provid- 
ers that can help you create a personalized 
presidential library. 
@ Rarely, some PLURIBUS contributors 
report feeling that they are in the ‘wrong 
body’ or that they were ‘switched’ upon 
disengaging from the PLURIBUS network. 
Regrettably, we can only offer counselling 
to help you adjust to your new life after your 
term ends. As you may know, the Supreme 
Court is deadlocked on this issue. For the 
time being, we can only legally recognize 
‘you on the basis of your corporeal form. 
@ You are free to use any social media 
accounts you may possess during your term 
in office. However, only the PLURIBUS col- 
lective has access to the official presidential 
accounts. Your individual e-mails, blog 
posts and tweets will not be considered 
presidential communications, for histori- 
cal reasons. 


As we would with any holder of this high 
office, we urge you to take the matter seri- 
ously and to set aside your own personal 
interests for the good of the country. You 
were selected on the basis ofa profile derived 
from millions of ballots. You reflect the col- 
lective will of the people. No one person 
can do it alone; but together, we can — with 
foresight, wisdom, teamwork and empathy 
— achieve greatness. m 


S. R. Algernon studied fiction writing 

and biology, among other things, at the 
University of North Carolina at Chapel Hill. 
He currently lives in Singapore. 
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SPOTLIGHT ON CANCER | CAREERS | 


How team science 
extends your scope 


Modern cancer research demands input from collaborators across 
a broad spectrum of disciplines — not all of them scientific. 


hen Daniel Stover was doing his 
postdoctoral research in a cell- 
biology lab at Harvard Medical 


School in Boston, Massachusetts, he ran into a 
problem. He was studying a type of breast can- 
cer, trying to work out whether genetic differ- 
ences between one part ofa tumour and another 
contributed to the cancer’s resistance to chemo- 
therapy. He had plenty to work with — genetic 
information from hundreds of tumour sam- 
ples — but no idea how to handle it all. 

“I had generated an immense amount of 
sequencing data and couldn't find anybody 
to analyse it,’ says Stover, now an oncologist 
at the Ohio State University’s Comprehensive 
Cancer Center in Columbus. So, with the 
help of a bioinformatician in the same lab, 
he started studying computational biology, 


BY NEIL SAVAGE 


which became the focus of his studies. “I 
found that I loved working with data,” he 
says. All the papers he published as a postdoc 
ended up being based on informatics, and 
now his own lab, which he set up last Sep- 
tember, focuses on clinical computational 
oncology. 

Stover’s lab aims to fill the space between the 
computer experts who develop data-handling 
algorithms and the clinicians who focus on 
patient care, treatment and clinical trials. “In 
between, there’s a gap, and we try to fill that 
void and take these amazing algorithms and 
apply them in clinical settings,” he says. 

Stover says the collaboration changed the 
direction of his career, in part because it gave 
him new skills that he could apply in working 
with other researchers. 


Cancer research has become highly multi- 
disciplinary. The field now includes not just 
clinicians and molecular biologists, but also 
computational biologists, statisticians, nano- 
technology experts and chemical engineers. 
And that creates challenges for all those 
researchers. How do they work with people 
who have different areas of expertise, each 
with its own basic assumptions and special- 
ized language? 

Nancy Krunic, who works for Novartis Phar- 
maceuticals in Cambridge, Massachusetts, 
heads the company’s Future Precision Medicine 
Diagnostics group, which is developing assays, 
software and other technology to aid in diag- 
nosis. “No one person or one department, or 
one lab, is going to have all the tools they need 
to tackle the problem,’ she says. “You absolutely 
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MAKING IT WORK 


‘Prenuptial agreements’ for scientists 


To help researchers to collaborate in 
multidisciplinary groups and anticipate 
difficulties in a project, the Office of the 
Ombudsman at the US National Institutes 
of Health came up with what it calls a 
prenuptial agreement for research teams. 
This lays out areas in which teams should 
reach agreements before problems arise 
(see go.nature.com/prenups). Here are 
some of the questions it suggests asking 
and answering before a collaboration begins 
in earnest: 


@ What are the scientific goals and expected 
outcomes of the project? 


need diverse backgrounds and subject-matter 
expertise.” 

Whether they're big pharmaceutical organi- 
zations or medical-device companies (Krunic 
previously worked at Luminex Molecular Diag- 
nostics in Toronto, Canada), industry groups 
targeting cancer must form multidisciplinary 
teams, Krunic says, if they are to define and 
tackle problems in ways that are scientifically, 
clinically and commercially viable. As well as 
scientists and technologists, these teams will 
include people with expertise in, for example, 
marketing and regulatory issues, says Krunic. 


BEYOND BIOLOGY 

Programmes exist to promote cross- 
fertilization between disciplines. The US 
National Cancer Institute (NCI), for example, 
established a Physical Sciences in Oncology 
initiative in 2009 to team cancer biologists 
with physicists, mathematicians, chemists and 
engineers. Those disciplines come at cancer in 
a variety of ways. Chemical engineers devise 
new diagnostics and develop nanoparticles to 
carry drugs to tumours, or to act as contrast 
agents that make smaller tumours visible in 
imaging. Physicists and bioengineers study the 
effect of mechanical forces on tumour growth 
and behaviour, and mathematicians develop 
computational models to explain the complex 
interplay between different cancer cells, blood 
vessels, healthy tissue and drugs. 

For example, researchers are working to 
understand the physical effects of a tumour’s 
environment. How does an increase in tumour 
stiffness affect the shape and behaviour of the 
cells within it? And when a metastasizing 
cell deforms to squeeze through tight spaces, 
what does the increased pressure do to the 
cell's nucleus — does it, for instance, trigger 
processes that damage DNA? “It’s not just the 
physical forces, but that’s an important aspect 
of what’s being studied,” says Nastaran Zahir, 
director of the Physical Sciences in Oncology 
programme. Other projects include applying 
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@ When will the project be over? 

@ Who will write the reports? 

@ How will you decide what to do if 
discoveries made during the project change 
the direction of your research? 

@ Who will do the hiring, firing and 
supervising? 

@ How will credit and authorship be assigned? 
@ How will you make decisions about new 
collaborations or spin-off projects? 

@ What will you do about patents and 
intellectual property? 

@ Who will manage the data? 

@ What will happen if a collaborator changes 
job during the project? 


mathematical approaches such as game the- 
ory to determine dosing strategies that will 
minimize the development of drug resistance, 
instead of applying the standard ‘maximum 
tolerated dose’ approach. 

Zahir has experience of crossing disciplines. 
She earned her bachelor’s degree in nuclear 
engineering, and studied plasma physics before 
moving into radiation biology and getting her 
PhD in bioengineering in a cancer research lab. 
So she's aware of the difficulties. “Biology has 
its own culture. Physics has a different culture,” 
she says. “In physics, what you search for is sort 
of the ultimate truth — is there a law? But biol- 
ogy’s very messy, and you dont necessarily have 
an exact process.” Because biological processes 
change in response to new stresses, it’s difficult 
to come up with laws for how a targeted cell 
would react to a cancer drug, for example. 


LANGUAGE BARRIERS 

To help bridge such gaps between disciplines, 
the NCI created the Science of Team Science 
programme. Kara Hall, a behavioural scientist 
who directs the initiative, says it’s important 
for people to share knowledge with those from 
other disciplines in a comprehensible way. 
“That entails reducing the jargon that’s being 
used, or finding ways to define that jargon as 
you go along,” she says. It is often helpful to use 
analogies to explain key concepts ina field. It's 
also useful for researchers to engage in ‘team 
learning’ in which individuals are tasked with 
gaining in-depth information on a topic and 
bringing it back to their colleagues. Hall says 
teams should reflect on how well they func- 
tion, by discussing, for instance, whether 
their meetings are sufficiently frequent and 
informative. 

Hall says that people must be open when 
they approach specialists in other fields. It’s 
important, she advises, to practise ‘disciplinary 
humility — to realize that all disciplines have 
both strengths and weaknesses, and to be will- 
ing to learn from fields other than your own. 


Finding a safe common ground to ask ques- 
tions can be difficult. “If !'m a psychologist 
collaborating with a geneticist, I may be afraid 
to ask a ‘genetics 101’ question because I might 
be seen as intellectually inferior,’ Hall says. 

Other challenges in team science include the 
need for extra planning and management time, 
compared with individualized research. The 
approach can also require more team meet- 
ings and more travel, when collaborators are 
located across campus or at other institutions. 
And some institutional structures have not yet 
caught up with the concept, Hall says. Promo- 
tion and tenure committees tend to look mainly 
at the first and last authors of papers, she says. 
And that means they might not recognize how 
much a middle author has contributed — even 
though, in teams, middle authors play a cru- 
cial part in the research. Yet, Hall says, her pro- 
gramme’s surveys found that trainees who had 
worked in multidisciplinary teams reported 
that their experiences had made them more 
competitive in the job market. 

Defining the goals of a project, planning its 
implementation and working out in advance 
how to resolve conflict are all important parts 
of setting up a collaboration. The NCI offers 
the Team Science Toolkit, an online resource 
whereby researchers can share information 
and post news about funding opportuni- 
ties and job openings (see go.nature.com/ 
tstoolkit). It also helps to plan and support an 
annual Science of Team Science conference, 
which focuses on ways to make team-based 
research more effective. The next meeting 
runs from 21 to 24 May at the University of 
Texas Medical Branch in Galveston. And the 
US National Institutes of Health offers what it 
calls a “prenuptial agreement” to help scientists 
prepare for problems that can arise during a 
collaboration (see ‘Making it work’). 

One early-career researcher taking a multi- 
disciplinary approach is Viktor Adalsteinsson, 
who leads the blood-biopsy team in the cancer 
programme at the Broad Institute of MIT and 
Harvard in Cambridge. Although he earned 
his doctorate in chemical engineering, in 
2015, Adalsteinsson knew from a young age 
that he wanted to help cure cancer. He did his 
PhD work at the Koch Institute for Integra- 
tive Cancer Research, which was set up at the 
Massachusetts Institute of Technology to bring 
biologists and engineers together under one 
roof. He helped to develop a system to isolate 
and sequence circulating tumour cells from 
blood samples, using his chemical-engineer- 
ing education to deal with issues such as fluid 
dynamics and the amount of shear stress that 
cells could handle. Now in his own lab, he's try- 
ing to capture cell-free cancer DNA from blood 
to perform sequencing for precision medicine, 
reducing the need for invasive biopsies. 

One of the ways in which Adalsteinsson 
and the people he works with stay up to date 
is through frequent meetings and seminars, 
at which various specialists talk about their 
work. Having a network of colleagues who can 
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explain research from other disciplines, or tell 
him whether a journal article is significant, is 
helpful, he says. “It’s impossible to be an expert 
in every possible area, and knowing when to 
turn to others is really important” 


POOL EXPERTISE 

Sometimes the trick lies in knowing what not to 
read. “Being able to scan and reject a bunch of 
stuff is really important,’ says Heather Parsons, 
a medical oncologist and physician at the 
Dana-Farber Cancer Institute in Boston, who 
specializes in breast cancer and its biomarkers. 
She, too, emphasizes the importance of having 
a network of experts, developed through uni- 
versity and work, with whom you can discuss 
questions. 

Parsons collaborates with Stover and 
Adalsteinsson on the liquid-biopsy work. “Tlike 
very much being part of this kind of a team,” 
she says, “but it requires that you don’t have an 
enormous ego and you don't mind asking about 
things you don’t understand.” 

At Stanford University School of Medicine in 
California, Guillem Pratx gets members of his 
physical oncology lab to take part in a journal 
club. They meet for an hour or so to focus on a 
particular paper, allowing people from differ- 
ent disciplines to gain a good understanding of 
its importance. He also requires them to attend 
meetings outside their field to broaden their 
knowledge. With enough exposure, he says, 
scientists can become comfortable with the 
terminology and concepts used in other areas. 
“T notice the more I sit in these talks, the more 
I understand,” Pratx says. “It’s like learning a 
new language.” 

Pratx did his undergraduate and graduate 
studies in electrical engineering, and during his 
PhD studies he worked ina radiology lab, using 
graphics techniques from computer games to 
improve the processing of medical images. 
He did his postdoctoral research in radiation 
oncology, and he feels that using postdoc time 
to learn about an area outside one’s core special- 
ity can pay off. It can be difficult to be hired bya 
lab that specializes in a field far removed from 
yours, he acknowledges. But if there's some 
overlap, it can add valuable expertise. 

Pratx’s lab, which includes scientists with 
backgrounds in physics, engineering, chemis- 
try and biology, develops instruments, probes 
and algorithms for cancer imaging. The team 
is studying how the luminescence generated 
when therapeutic radiation hits tissue can be 
used to carefully aim the otherwise-invisible 
beam. One challenge for such multidiscipli- 
nary teams is communicating to different 
members how they can tackle a problem, he 
says. Biologists often struggle to understand 
what questions mathematical models can ask 
concerning the large data sets generated in 
cancer research — sets that include not only 
genomic and proteomic sequences, but also 
imaging results and environmental informa- 
tion from medical records. It’s important that 
there’s someone in the group who understands 
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which statistical methods are best applied to 
particular types of data, and what the results 
do and don't show, says Pratx. 

On the flip side, he thinks that engineers 
can focus too much on trying to come up with 
innovative techniques, and are sometimes less 
interested in applying what others have already 
developed. It’s not enough for insights gleaned 
from data to be new, he says. They also have to 
be biologically relevant. 

One problem that Pratx sees is the one 
Stover experienced. Although the growth in 
data is increasing the need for computational 
specialists in cancer research, the competition 
from other fields for people with those skills 
is strong. 


MATCHMAKERS 

Early-career researchers interested in forming 
collaborations need to network with people 
from other fields, and one obvious way is to 
attend conferences in those fields. But Jennifer 
Podesta, a molecular biologist and a specialist 
in the use of nanotechnology for drug delivery, 
says that simply attending a conference isn't 
enough. “Do alittle bit of homework, and go in 
very much with an agenda of ‘who it is I want to 
meet and what do I want to get out of it?” she 
says. “It’s remarkable how many people think 
they can show up, scrunch over and stand in 
the corner, and come away from it complaining 
that they didn’t meet a collaborator.” 

Podesta, who runs the Cancer Research UK 
Centre at Imperial College London, recom- 
mends working out the type of scientist you 
need for the project you have in mind, and 
then approaching department heads in your 
own university to see who they think might fit. 
Funding managers also tend to have a broad 
knowledge of which researchers have what 
expertise, and are usually happy to play match- 
maker. 

Getting funding for cross-disciplinary pro- 
jects can be challenging, especially for some- 
one who hasnt yet established a reputation, 
so Podesta suggests looking for small sums of 
money internally, to fund a pilot project with 
a new collaborator. Such projects demonstrate 
that members of the team can work together 
and produce viable ideas, making them more 
attractive to funding agencies. The NCI’s 
Physical Sciences in Oncology programme 
provides funding specifically for pilot projects. 

Trying to keep up with a field as dynamic as 
cancer research is daunting. “We have so much 
information within our reach, and new discov- 
eries are being made every day,’ Adalsteinsson 
says. The key to tackling all that information, 
Pratx says, is to overcome the tendency of many 
scientists to think they need to learn everything 
themselves. “I think it’s an important skill when 
you're able to say, ‘Maybe I don't need to be an 
expert in computer modelling. I can maybe 
work with somebody else,” he says. = 


Neil Savage is a freelance science and 
technology writer in Lowell, Massachusetts. 
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